store 300 digit number in sql - mysql

Which datatype can I use to store really big integer in SQL. I am using phpmyAdmin to view data and java program for storing and retrieving values. Actually I am working with Bilinear Maps which uses random numbers generated from Zp where p is very large prime number and then "raised to" operations on those number.
I want to store some numbers in database like public keys. What data type can I use for table columns in SQL for such values?

You could store them as strings of decimal digits using type CHARACTER. While this does waste some space, an advantage is that the database will be easier for humans to understand.
You could store them as raw binary big-endian values using type BLOB. This is the most efficient for software to access and takes up the least space. However, humans will not be able to easily query the database for these values or understand them in dumps.
Personally, I would opt for the blob unless there's a real need for the database to be understandable by humans using standard query tools. If you can't get around needing to administer the database with tools that don't understand your data format, then just use decimal values in text.

For MySQL, VARCHAR(300) CHARACTER SET ascii.
VAR, assuming the numbers won't always be exactly 300.
CHAR -- no big advantage in BLOB.
ascii -- no need for utf8 involvement.
DECIMAL won't work because there is a 64-digit limit.
The space taken will be 2+length bytes (302 in your example), where the 2 is for length for VAR.

Related

Advice for storing dna sequence data in mysql

I am creating a database that will store DNA sequence data, which are strings like this: 'atcgatcgatcg' and protein sequence data that are also strings like this: 'MKLPKRML'.
I am a beginner in MySQL management I want to ask you for the proper configuration of these columns in terms of data types, character set and collation. There will be around one million of DNA and protein sequence rows, and I want to use string comparisons as higher performance possible.
I've reading about this problem and I have these conclusions and doubts
I could use VARCHAR(MAX) because the length of my strings will not be higher than 65,535 characters.
BOLD fields commparisons are fasters. Is better than VARCHAR in this case? I am also thinking in issues associated with a retrieve of data because of the retrieve must be in string type, not bytes
Is better to use latin-1 instead of utf-8? I'am only storing alphabet without special characters
Thank you for your help!

Conversion of strings with binary values

This is a question of converting strings from DB2 to SQL Server.
On DB2 you can have a column that contains a mix of strings and binary data (e.g. using REDEFINS in COBOL to combine string and decimal values into a DB2 column).
This will have unpredictable results during data replication as the binary zero (0x00) is treated as string-terminator (in the C family of software languages).
Both SQL Server and DB2 are able to store binary zero in the middle of fixed length char columns without any issue.
Has anyone any experiences with this problem? The way I see it, the only way to fix it, is to amend the COBOL program and the database schema, so if you have a column of 14 chars, where the first 10 is a string and the last 4 a decimal, split this up into two columns containing one "part" each.
If you want to just transfer the data 1:1, I'd just create a binary(x) field of equal length, of varbinary(x) in case the length differs.
If you need to easily access the stored string and decimal values, you could create a number of computed columns that extract the string/decimal values from the binary(x) field and represents them as normal columns. This would allow you to do an easy 1:1 migration while having simple and strongly typed access to the contents.
The optimal way would be to create strongly typed columns on the SQL Server database and then perform the actual migration either in COBOL or whatever script/system is used to perform the one time migration. You could still store a binary(x) to save the original value, in case a conversion error occurs, or you need to present the original value to the COBOL system.

Storing large prime numbers in a database

This problem struck me as a bit odd. I'm curious how you could represent a list of prime numbers in a database. I do not know of a single datatype that would be able to acuratly and consistently store a large amount of prime numbers. My concern is that when the prime numbers are starting to contain 1000s of digits, that it might be a bit difficult to reference form the database. Is there a way to represent a large set of primes in a DB? I'm quite sure that this has topic has been approached before.
One of the issues about this that makes it difficult is that prime numbers can not be broken down into factors. If they could this problem would be much easier.
If you really want to store primes as numbers and one of questions, stopping you is "prime numbers can not be broken down into factors", there are another thing: store it in list of modulus of any number ordered by sequence.
Small example:
2831781 == 2*100^3 + 83*100^2 + 17*100^1 + 81*100^0
List is:
81, 17, 83, 2
In real application is useful to split by modulus of 2^32 (32-bits integers), specially if prime numbers in processing application stored as byte arrays.
Storage in DB:
create table PRIMES
(
PRIME_ID NUMBER not null,
PART_ORDER NUMBER(20) not null,
PRIME_PART_VALUE NUMBER not null
);
alter table PRIMES
add constraint PRIMES_PK primary key (PRIME_ID, PART_ORDER) using index;
insert for example above (1647 is for example only):
insert into primes(PRIME_ID, PART_ORDER, PRIME_PART_VALUE) values (1647, 0, 81);
insert into primes(PRIME_ID, PART_ORDER, PRIME_PART_VALUE) values (1647, 1, 17);
insert into primes(PRIME_ID, PART_ORDER, PRIME_PART_VALUE) values (1647, 2, 83);
insert into primes(PRIME_ID, PART_ORDER, PRIME_PART_VALUE) values (1647, 3, 82);
prime_id value can be assigned from oracle sequence ...
create sequence seq_primes start with 1 increment by 1;
Get ID of next prime number to insert:
select seq_primes.nextval from dual;
select prime number content with specified id:
select PART_ORDER, PRIME_PART_VALUE
from primes where prime_id = 1647
order by part_order
You could store them as binary data. They won't be human readable straight from the database, but that shouldn't be a problem.
Databases (depending on which) can routinely store numbers up to 38-39 digits accurately. That gets you reasonably far.
Beyond that you won't be doing arithmetic operations on them (accurately) in databases (barring arbitrary-precision modules that may exist for your particular database). But numbers can be stored as text up to several thousand digits. Beyond that you can use CLOB type fields to store millions of digits.
Also, it's worth nothing that if you're storing sequences of prime numbers and your interest is in space-compression of that sequence you could start by storing the difference between one number and the next rather than the whole number.
This is a bit inefficient, but you could store them as strings.
If you are not going to use database-side calculations with these numbers, just store them as bit sequences of their binary representation (BLOB, VARBINARY etc.)
Here's my 2 cents worth. If you want to store them as numbers in a database then you'll be constrained by the maximum size of integer that your database can handle. You'd probably want a 2 column table, with the prime number in one column and it's sequence number in the other. Then you'd want some indexes to make finding the stored values quick.
But you don't really want to do that do you, you want to store humongous (sp?) primes way beyond any integer datatype you've even though of yet. And you say that you are averse to strings so it's binary data for you. (It would be for me too.) Yes, you could store them in a BLOB in a database but what sort of facilities will the DBMS offer you for finding the n-th prime or checking the primeness of a candidate integer ?
How to design a suitable file structure ? This is the best I could come up with after about 5 minutes thinking:
Set a counter to 2.
Write the two-bits which represent the first prime number.
Write them again, to mark the end of the section containing the 2-bit primes.
Set the counter to counter+1
Write the 3-bit primes in order. ( I think there are two: 5 and 7)
Write the last of the 3-bit primes again to mark the end of the section containing the 3-bit primes.
Go back to 4 and carry on mutatis mutandis.
The point about writing the last n-bit prime twice is to provide you with a means to identify the end of the part of the file with n-bit primes in it when you come to read the file.
As you write the file, you'll probably also want to make note of the offsets into the files at various points, perhaps the start of each section containing n-bit primes.
I think this would work, and it would handle primes up to 2^(the largest unsigned integer you can represent). I guess it would be easy enough to find code for translating a 325467-bit (say) value into a big integer.
Sure, you could store this file as a BLOB but I'm not sure why you'd bother.
It all depends on what kinds of operations you want to do with the numbers. If just store and lookup, then just use strings and use a check constraint / domain datatype to enforce that they are numbers. If you want more control, then PostgreSQL will let you define custom datatypes and functions. You can for instance interface with the GMP library to have correct ordering and arithmetic for arbitrary precision integers. Using such a library will even let you implement a check constraint that uses the probabilistic primality test to check if the numbers really are prime.
The real question is actually whether a relational database is the correct tool for the job.
I think you're best off using a BLOB. How the data is stored in your BLOB depends on your intended use of the numbers. If you want to use them in calculations I think you'll need to create a class or type to store the values as some variety of ordered binary value and allow them to be treated as numbers, etc. If you just need to display them then storing them as a sequence of characters would be sufficient, and would eliminate the need to convert your calculatable values to something displayable, which can be very time consuming for large values.
Share and enjoy.
Probably not brilliant, but what if you stored them in some recursive data structure. You could store it as an int, it's exponent, and a reference to the lower bit numbers.
Like the string idea, it probably wouldn't be very good for memory considerations. And query time would be increased due to the recursive nature of the query.

What is the appropriate data type to use for storing numbers with leading zeroes?

In Access 2003 I need to display numbers like this while keeping the leading zeroes:
080000
090000
070000
What data type should I use for this?
Use a string (or text, or varchar, or whatever string variant your particular RDBMS uses) and pad it with whatever character you want ("0") that you need.
Key question:
Are the leading zeros meaningful data, or just formatting?
For instance, 07086 is my zip code, and the leading zero is meaningful, so US zip codes have to be stored as text.
Are the values '1', '01', '001' and '0001' considered to be unique, legal values or are they considered to be duplicates?
If the leading zero is not meaningful in your table, and is just there for formatting, then store the data as a number and format with leading zeros as needed for display purposes.
You can use the Format() function to do your formatting, as in this example query:
SELECT Format(number_field, "000000") AS number_with_leading_zeroes
FROM YourTable;
Also, number storage and indexing in all database engines I know of are more efficient than text storage and indexing, so with large data sets (100s of thousands of records and more), the performance drag of using text data type for numeric data can be quite large.
Last of all, if you need to do calculations on the data, you want them to be stored as numbers.
The key is to start from how the data is going to be used and choose your data type accordingly. One should worry about formatting only at presentation time (in forms and reports).
Appearance should never drive the choice of data types in the fields in your table.
If your real data looks like your examples and has a fixed number of digits, just store the data in a numeric field and use the format/input mask attributes of the column in Access table design display them with the padded zeros.
Unless you have a variable number of leading zeros there is no reason to store them and it is generally a bad idea. unecessarily using a text type can hurt performance, make it easier to introduce anomalous data, and make it harder to query the database.
Fixed width character with Unicode compression with a CHECK constraint to ensure exactly six numeric characters e.g. ANSI-92 Query Mode syntax:
CREATE TABLE IDs
(
ID CHAR(6) WITH COMPRESSION NOT NULL
CONSTRAINT uq__IDs UNIQUE,
CONSTRAINT ID__must_be_ten_numeric_chars
CHECK (ID ALIKE '[0-9][0-9][0-9][0-9][0-9][0-9]')
);
Do you need to retain them as numbers within the table (i.e. do think you will need to do aggregations within queries - such as SUM etc)?
If not then a text/string datatype will suffice.
If you DO then perhaps you need 2 fields.
to store the number [i.e. 80000] and
to store some meta-data about how the value needs to be displayed
perhaps some sort of mask or formatting pattern [e.g. '000000'].
You can then use the above pattern string to format the display of the number
if you're using a .NET language you can use System.String.Format() or System.Object.ToString()
if you're using Access forms/reports then Access uses very similar string formatting patterns in it's UI controls.

Storing very large integers in MySQL

I need to store a very large number (tens of millions) of 512-bit SHA-2 hashes in a MySQL table. To save space, I'd like to store them in binary form, rather than a string a hex digits. I'm using an ORM (DBix::Class) so the specific details of the storage will be abstracted from the code, which can inflate them to any object or structure that I choose.
MySQL's BIGINT type is 64 bits. So I could theoretically split the hash up amongst eight BIGINT columns. That seems pretty ridiculous though. My other thought was just using a single BLOB column, but I have heard that they can be slow to access due to MySQL's treating them as variable-length fields.
If anyone could offer some widsom that will save me a couple hours of benchmarking various methods, I'd appreciate it.
Note: Automatic -1 to anyone who says "just use postgres!" :)
Have you considered 'binary(64)' ? See MySQL binary type.
Use the type BINARY(64) ?