Does my MySQL schema make sense? Any recommendations? - mysql

I need to create a database that stores the following content (about 5,765 entries total): http://s18.postimg.org/s73exwemf/Capture.jpg
I'm using MySQL Workbench to create my schema. So far I have one table with the following columns:
EPSG Code (INT) - PK, NN
CRS_NAME CHAR(50) - UQ
CRS_TYPE - ENUM('Projected', 'Geographic 2D', 'Geographic 3D', 'Geocentric', 'Vertical', 'Compound')
PROJ_FILE - CHAR(800)
Do my dataypes make sense? Generally, I will retrieve the CRS Name, type and proj file contents using the EPSG code. But sometimes, the only information available may be the CRS name. That's why I made CRS_NAME a unique index.
Does that make sense? I'm new to SQL and I'm enjoying it so far.

The following is for the most part personal preferences developed over almost 10 years of working with databases (MySQL mostly).
CRS_NAME: Unique key sounds appropriate to me.
CRS_TYPE: I tend to stay away from enum in the database. Instead, I suggest a separate CRS_TYPE table, and putting a CRS_TYPE_ID field
in the table instead of an enum type. I would not make CRS_TYPE.ID an
auto-increment; ideally, you want it to reflect the values used in an
enum in whatever programming language you might work with.
(Technically, the additional table is only necessary for
documentation and easier reporting purposes.)
PROJ_FILE: TINYTEXT, MEDIUMTEXT, TEXT, etc... might be a better option (or equivalent BLOBs). CHAR(800) is going to use 800 (or more
depending on character set) bytes whether it holds nothing or is
full. VARCHAR(800) could be better (from a space used perspective), but if
using MyISAM engine, causes data rows to be dynamic (slowing
queries). Regardless of engine used, TEXT and BLOB types only take up
as much room as they need and don't "fragment" tables like VARCHAR.
The downside is they are little more complicated to search within
and index.

Related

Multiple possible data types for the same attribute: null entries, EAV, or store as varchar?

I'm creating a database for combustion experiments. Each experiment has some scientific metadata which I call 'details'. For example ('Fuel', 'C2H6') or ('Pressure', 120). Because the same detail names (like 'Fuel') show up a lot, I created a table just to store the names and units. Here's a simplified version:
CREATE TABLE properties (
id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(50) NOT NULL,
units NVARCHAR(15) NOT NULL DEFAULT 'dimensionless',
);
I also created a table called 'details' which maps 'properties' to values.
CREATE TABLE details (
id INT AUTO_INCREMENT PRIMARY KEY,
property_id INT NOT NULL,
value VARCHAR(30),
FOREIGN KEY(property_id) REFERENCES properties(id)
);
This isn't ideal because the value attribute is sometimes a chemical name and sometimes a float. In the future, there may even be new entries that have integer values. Storing everything in a VARCHAR seems wasteful. Since it'll be hard to change later, I want to make the right decision now.
I've been researching this for hours and have considered four options:
Store everything as varchar under value (simplest to develop)
Use an EAV model (most complicated to develop).
Create a column for each type, and have plenty of NULL entries.
value_float, value_int, value_char
Use the JSON datatype.
Looking into each one, it seems like they're all bad in different ways. (1) is bad since it takes up extra space and I have to do extra operations to parse strings into numeric values. (2) is bad because of the huge increase in complexity (four extra tables and a lot more join operations), plus I hear EAV is to be avoided. (3) is a middle-ground in complexity, but there will be two NULL values for each table entry. (4) seems similar to (1), and I'm not sure how it might be better or worse.
I don't expect to have huge growth on this database or millions of entries. It just needs to be fast and searchable for researchers. I'm willing to have more backend complexity for a better/faster user experience.
By now I realize that there aren't that many clear-cut answers in database design. I'm simply asking for some insight into my three options, or perhaps another option I haven't thought of.
EDIT: Added JSON as an option.
Well, you have to sacrify something. Either HD space, or performance, or specific/general dimension or easy/complex to develop dimension. Choose a mix suitable for your needs and situation. - I solved it in 2000 in a general kind of EAV solution this way: basic record had a common properties shared by majority of events, then joins to properties without values (associative table), and those ones very specific properties/values I stored in a BLOB in XML like tags. This way I combined frequent properties with those very specific ones. AS this was intended as VERY GENERAL solution, you probably don't need, I'd sacrifice space, it's cheap today. Who cares if you take more space than it's "correct according to data modeling theory". Ok data model will be ugly, so what ? - You'll still need to decide on specific/general dimension - how specific attributes will be solved - either as specific columns (yes if they are repeated often) or in Property-TypeOfProperty-Value type of table.

Is it correct to have a BLOB field directly in the main table?

Which one is better: having a BLOB field in the same table or having a 1-TO-1 reference to it in another table?
I'm making a MySQL database whose main table is called item(ID, Description). This table is consulted by a program I'm developing in VB.NET which offers the possibility to double-click a specific item obtained with a query. Once opened its dedicated form, I would like to show an image stored in the BLOB field, a sort of item preview. The problem is I don't know where is better to create this BLOB field.
Assuming to have a table like this: Item(ID, Description, BLOB), will the BLOB field affect the database performance on queries like:
SELECT ID, Description FROM Item;
If yes, what do you think about this solution:
Item(ID, Description)
Images(Item, File)
Where Images.Item references to Item.ID, and File is the BLOB field.
You can add the BLOB field directly to your main table, as BLOB fields are not stored in-row and require a separate look-up to retrieve its contents. Your dependent table is needless.
BUT another and preferred way is to store on your database table only a pointer (path to the file on server) to your image file. In this way you can retrive the path and access the file from your VB.NET application.
To quote the documentation about blobs:
Each BLOB or TEXT value is represented internally by a separately allocated object. This is in contrast to all other data types, for which storage is allocated once per column when the table is opened.
In simpler terms, the blob's storage isn't stored inside the table's row, only a pointer is - which is pretty similar to what you're trying to achieve with the secondary table. To make a long story short - there's no need for another table, MySQL already doesn't the same thing internally.
Most of what has been said in the other Answers is mostly correct. I'll start from scratch, adding some caveats.
The two-table, 1-1, design is usually better for MyISAM, but not for InnoDB. The rest of my Answer applies only to InnoDB.
"Off-record" storage may happen to BLOB, TEXT, and 'large' VARCHAR and VARBINARY, almost equally.
"Large" columns are usually stored "off-record", thereby providing something very similar to your 1-1 design. However, by having InnoDB do the work usually leads to better performance.
The ROW_FORMAT and the size of the column makes a difference.
A "small" BLOB may be stored on-record. Pro: no need for the extra fetch when you include the blob in the SELECT list. Con: clutter.
Some ROW_FORMATs cut off at 767 bytes.
Some ROW_FORMATs store 20 bytes on-record; this is just a 'pointer'; the entire blob is off-record.
etc, etc.
Off-record is beneficial when you need to filter out a bunch of rows, then fetch only a few. Also, when you don't need the column.
As a side note, TINYTEXT is possibly useless. There are situations where the 'equivalent' VARCHAR(255) performs better.
Storing an image in the table (on- or off-record) is arguably unwise if that image will be used in an HTML page. HTML is quite happy to request the <img src=...> from your server or even some other server. In this case, a smallish VARCHAR containing a url is the 'correct' design.

Most efficient way to search a database with more than a billion records?

My client has a huge database containing just three fields:
Primary key (a unsigned number)
Name (multi-word text)
Description (up to 1000 varchar)
This database has got over few billion entries. I have no previous experience in handling such large amounts of data.
He wants me to design an interface using AJAX (like Google) to search this database. My queries are as slow as turtle.
What is best way to search text fields in such a large database? If the user is typing wrong spelling on interface, how can I return what he wanted ?
If you are using FULLTEXT indexes, you're correctly writing your queries, and the speed in which the results are returned are not adequate, you are entering a territory where MySQL may simply not be sufficient for you..
You may be able to tweak settings, purchase enough RAM to make sure that your entire data-set fits 100% in memory. It's definitely true that performance gains could be huge there.
I'd definitely recommend looking into tweaks of your mysql configuration. We've had some silly settings in the past. Operating system defaults tend to really suck!
However, if you have trouble at that point, you can:
Create a separate table containing each word (indexed) along with a record id that it refers to. This will allow you to search on single words.
Use a different system that's optimized for solving this problem. Unless my information is now outdated, the 2 engines that are the most popular for solving this problem are:
Sphinx
Solr / Lucene
If your table is myISAM then you can set the Name and Description fields to FULLTEXT
CREATE TABLE articles (
id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
Name VARCHAR(200),
Description TEXT,
FULLTEXT (Name,Description)
);
Then you can use queries like:
SELECT * FROM articles
WHERE MATCH (Name,Description) AGAINST ('database');
Your can find more info at http://docs.oracle.com/cd/E17952_01/refman-5.0-en/fulltext-search.html
Before doing any of the above you might want to backup (or atleast make a copy) of your database.
You can't. The only fast search in your scenario would be on the Primary Key since that's most likely to be the index. Text search is slow as a turtle.
In all seriousness, you have a few solutions:
If you have to stick with NoSQL you'll have to redesign you scheme. It's hard to give you a good recommendation without knowing the requirements. One solution would be to index keywords in a separate table.
Another solution is to switch to a different search engine, you can find suggestions in other questions here such as: Fast SQL Server search on 40M text records

Does splitting TEXT fields into multiple tables provide performance optimization in multi-language application?

I'm building a project and I have a question about mysql databases. The application is multi-language. And we are wondering if you will get better performance if we split up different types of text-fields (varchar, text, med-text) to different tables? Or is it just better to create one table with just a text field?
With this question and the multi-language constraint in mind, I am wondering if the performance will rise of I split up the different types of text fields into seperate tables. Because when you just have one table with all the texts and the language you can search it easily. (Give me the text with this value (in an item column) and that language) When you have different tables for different types of text. You will save space in your database. Because you don't need a full text area for a varchar(200), but you will have multiple tables to create a connection between the item, the type of text and the languages you have for your text.
What do you think is the best? Or are there some possibilities that I didn't used?
I find it better for performance reasons to keep columns with blob and text data types in a separate able from the other data types even if it breaks normalization.
Consider a person table with columns name varchar, address varchar, dob date and picture blob. A picture can be about 1MB easily while the remaining columns may not take any more than 1KB. Imagine how many blokcs of data needed to be read even if you only want to list the name and address of people living in a certain city - if you are keeping everything in the same table.
If you are not bound to MySQL, I would suggest you to use some sort of text-searching engines, such as Apache Lucene if you want to do full-text searches. Because as far as I know, MySQL does not provide as much performance as Lucene can for full-text searches.
In case you are bound to MySQL, let me try to provide some information based on current definition of the problem (which is actually not much yet).
MySQL reference documentation states that:
Instances of BLOB or TEXT columns in the result of a query that is processed using a temporary table causes the server to use a table on disk rather than in memory because the MEMORY storage engine does not support those data types.
So, if you run your queries using SELECT * on a table that contains text field, you can either separate queries that really need the text field and the ones that don't need it to gain speed; or alternatively you can separate the text field from the table as well. Saving text field on a secondary table will cause you an extra overhead of the duplicated key storage and also the indexes for that secondary table. However according to your database design, you may also be suffering overhead for unnecessary index updates which can be eliminated by moving the text field to another table, but this is just a proposition since we don't know your schema and data access occasions.

Best way to store 'extra' user data in MySQL?

I'm adding a new feature to my user module for my CMS and I've hit a road block... Or I guess, a fork in the road, and I wanted to get some opinions from stackoverflow before I commit to anything.
Basically I want to allow admins to add new, 'extra' user fields that users can fill out on registration, edit in their profile, and/or be controlled by other modules. An example of this would be a birthday field, a lengthy description of themselves, or maybe points the user has earned on the site. Needless to say, the data stored will be varied and can range from large amounts of text, to a small integer value. To make matters worse - I want there to be the option to search this data.
With that out of the way - what would be the best way to do this? Right now I'm leaning towards having a table with the following columns.
userid, refFieldID, varchar, tinyint, smallint, int, text, date, datetime, etc.
I would prefer this as it would make searching significantly faster, and the reference table (Which holds all of the field's data, such as the name of the field, whether it's searchable or not, etc.) can reference which column should be used when storing data for that field.
The other idea, which was suggested to me and I've seen used in other solutions (vBulletin being one, although I have seen others whose names escape me at the moment), where you just have the userid, reference id, and a medtext field. I don't know enough about MySQL to say this with any certainty, but this method seems like it would be slower to search, and possibly have a larger overhead.
So which method would be 'best'? Is there another method I'm missing? Whichever method I end up using, it needs to be fast to search, not massive (A tiny bit of overhead is fine), and preferably allow complex queries used against the data.
I agree that a key-value table is probably the best solution. My first inclination would be to just store a text column, like vBulletin did. But, if you wanted to add the ability for the data store to be a bit more extensible and searchable like you've laid out, I might suggest:
1 medium/longtext or medium/longblob field for arbitrary text/binary storage (whatever is stored + overhead of 3-4 bytes for string length). Only reason to choose medium over long is to limit what can be stored to 2^24 bytes (16.7 MB) versus 2^32 bytes (2 GB).
1 integer (4 bytes) or bigint (8 bytes)
1 datetime (8 bytes)
Perhaps 1 float or double (4-8 bytes) for floating point storage
These fields will allow you to store nearly any type of data in the table but without inflating the width of table** (like a varchar would) and avoid any redundant storage (like having tinyint and mediumint etc). The text stored in the longtext field can still be reasonably searched using a fulltext index or a regular limited length index (e.g. index longtext_storage(8)).
** all blob values, such as longtext, are stored independently from the main table.
One technique that might work for you is to store this arbitrary data as text, in some notation like JSON, XML, or YAML. This decision depends on how you'll need to access the data: if you only look up each user's full chunk of user data, it could be ideal. If you need to run SQL queries on specific fields in the user data, you'll need to use a pure SQL or a hybrid approach.
Many of the newer, highly scalable "NoSQL" systems seem to favor JSON data (eg, MongoDB, CouchDB, and Project Voldemort). It's nice and terse, and you can create arbitrarily complex structures including maps (JSON objects) and lists (JSON arrays).