I've got the following situation: I want to store data, which represents, if a user is following another user. Another table, which I cannot touch, stores the users, where the username is the primary key (unfortunatly no id...).
The fact is, if one user follows another one, it doesn't mean, that the other one is following the first one.
Right now, I designed the table with two varchar's (128) and a UNIQUE INDEX on these two varchar's which represent the usernames.
The problem is, that I need to parse some old-styled system now, and I finished like 15% and I've got 550k entries on this table already.
The index is bigger then 16MB, and the data just 14MB.
What could I do, to save this data in a better way? As said, I cannot use id's instead of the usernames, because the user-table uses the username as primary key.
As you have noticed, creating a seperate index on all columns essentially forces MySQL to duplicate all data in the index.
Instead of creating a seperate unique index, you can create a primary key consisting of both of your fields. MySQL uses the primary key as a clustered index making sure your uniqueness constraint is still satisfied without increasing the size of your database.
You might consider building your own index table that contains ID > username.
You could then use the ID's to map the followers.
This will cause for some extra overhead if you want to retrieve all the data.
Related
I want to structure a MySQL table with many of the the usual columns such as index as an integer, name as a varchar, etc. The thing about this table is I want to include a column that has an unknown number of entries. I think the best way to do this (if possible) is to make one of the columns an array that can be changed as any entry in a database can. Supposing when the record is created it has 0 entries. Then later, I want to add 1 or more. Maybe sometime later still, I might want to remove 1 or more of these entries.
I know I could create the table with individual columns for each of these additions, but I may want as many as a hundred or more for one record. This seems very inefficient and very difficult to maintain. So the bottom-line question is can a column be defined as a dynamic array? If so, how? How can things be selectively added to or removed from it?
I'll take a stab in the dark and guess maybe make a table contain another table. I've never heard of this because my experience with MySQL has been mostly casual. I make databases and dynamic websites because I want to.
The way to do this in a relational database is to create another table. One column of that table will have a foreign key pointing to the primary key of that table that should have had the array (or multiple columns, if the primary key consists of more than one row). Another column has the values that'd be found in the array. If order matters, a third column would store some other values indicating the ordinality.
Something along the lines of:
CREATE TABLE elbat_array
(id integer,
elbat integer -- or whatever type the primary key column has
NOT NULL,
value text, -- or whatever type the values should have
ordinality integer
NOT NULL, -- optional
PRIMARY KEY (id),
FOREIGN KEY (elbat)
REFERENCES elbat -- the other table
(id) -- and its primary key column
ON DELETE CASCADE,
UNIQUE (ordinality));
To add to the "array", insert rows into that table. To remove, delete rows. There can be as many as zero rows (i.e. "array" elements) or as much as there's disk space (unless you hit any limit of the DBMS before, but if such a limit applies it would be very large, so usually that should not be a problem).
Also worth a read in that context: "Is storing a delimited list in a database column really that bad?" While it's not about an array type in particular, on the meta level it discusses why the values in a column should be atomic. An array would violate that as well as a delimited list does.
During the creation of tables using mysql on phpmyadmin, I always find an issue when it comes to primary keys and their auto-increments. When I insert lines into my table. The auto_increment works perfectly adding a value of 1 to each primary key on each new line. But when I delete a line for example a line where the primary key is 'id = 4' and I add a new line to the table. The primary key in the new line gets a value of 'id = 5' instead of 'id = 4'. It acts like the old line was never deleted.
Here is an example of the SQL statement:
CREATE TABLE employe(
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(30) NOT NULL
)
ENGINE = INNODB;
How do I find a solution to this problem ?
Thank you.
I'm pretty sure this is by design. If you had IDs up to 6 in your table and you deleted ID 2, would you want the next input to be an ID of 2? That doesn't seem to follow the ACID properties. Also, if there was a dependence on that data, for example, if it was user data, and the ID determined user IDs, it would invalidate pre-existing information, since if user X was deleted and the same ID was assigned to user Y, that could cause integrity issues in dependent systems.
Also, imagine a table with 50 billion rows. Should the table run an O(n) search for the smallest missing ID every time you're trying to insert a new record? I can see that getting out of hand really quickly.
Some links you might like to read:
Principles of Transaction-Oriented Database Recovery (1983)
How can we re-use the deleted id from any MySQL-DB table?
Why do you care?
Primary keys are internal row identifiers that are not supposed to be sexy or good looking. As long as they are able identify each row uniquely, they serve their purpose.
Now, if you care about its value, then you probably want to expose the primary key value somewhere, and that's a big red flag. If you need an external, visible identifier, you can create a secondary column with any formatting sequence and values you want.
As a side note, the term AUTO_INCREMENT is a bit misleading. It doesn't really mean they increase one by one all the time. It just mean it will try to produce sequential numbers, as long as it is possible. In multi-threaded apps that's usually not possible since batches or numbers are reserved per thread so the row insertion sequence may end actually not following the natural numbering. Row deletions have a similar effect, as well as INSERT with roll backs.
Primary keys are meant to be used for joining tables together and
indexing, they are not meant to be used for human usage. Reordering
primary key columns could orphan data and wreck havoc to your queries.
Tips: Add another column to your table and reorder that column to your will if needed (show that column to your user instead of the primary key).
I am new to MSAccess so I'm not sure about this; do I have to have a primary key for every single table in my database? I have one table which looks something like this:
(http://i108.photobucket.com/albums/n32/lurker3345/ACCESSHELP.png?t=1382688844)
In this case, every field/column has a repeating term. I have tried assigning the primary key to every field but it returns with an error saying that there is a repeated field.
How do I go about this?
Strictly speaking, Yes, every row in a relational database should have a Primary Key (a unique identifier). If doing quick-and-dirty work, you may be able to get away without one.
Internal Tracking ID
Some database generate a primary key under-the-covers if you do not assign one explicitly. Every database needs some way to internally track each row.
Natural Key
A natural key is an existing field with meaningful data that happens to identify each row uniquely. For example, if you were tracking people assigned to teams, you might have an "employee_id" column on the "person" table.
Surrogate Key
A surrogate key is an extra column you add to a table, just to assign an arbitrary value as the unique identifier. You might assign a serial number (1, 2, 3, …), or a UUID if your database (such as Postgres) supports that data type. Assigning a serial number or UUID is so common that nearly every database engine provides a built-in facility to help you automatically create such a value and assign to new rows.
My Advice
In my experience, any serious long-term project should always use a surrogate key because every natural key I've ever been tempted to use eventually changes. People change their names (get married, etc.). Employee IDs change when company gets acquired by another.
If, on the other hand, you are doing a quick-and-dirty job, such as analyzing a single batch of data to produce a chart once and never again, and your data happens to have a natural key then use it. Beware: One-time jobs often have a way of becoming recurring jobs.
Further advice… When importing data from a source outside your control, assign your own identifier even if the import contains a candidate key.
Composite Key
Some database engines offer a composite key feature, also called compound key, where two or more columns in the table are combined to create a single value which once combined should prove unique. For example, in a "person" table, "first_name" and "last_name", and "phone_number" fields might be unique when considered together. Unless two people married and sharing the same home phone number while also happening to each be named "Alex" with a shared last name! Because of such collisions as well as the tendency for meaningful data to change and also the overhead of calculating such combined values, it is advisable to stick with simple (single-column) keys unless you have a special situation.
If the data doesn't naturally have a unique field to use as the primary key, add an auto-generated integer column called "Id" or similar.
Read the "how to organize my data" section of this page:
http://www.htmlgoodies.com/primers/database/article.php/3478051
This page shows you how to create one (under "add an autonumber primary key"):
http://office.microsoft.com/en-gb/access-help/create-or-remove-a-primary-key-HA010014099.aspx
In you use a DataAdapter and a Currency Manager, your tables must have a primary key in order to push updates, additions and deletions back to the database. Otherwise, they will not register and you will receive an error.
I lost one week figuring that one out until I added this to the Try-Catch-End Try block: MsgBox(er.ToString) which mentioned "key". From there, I figured it out.
(NB : Having a primary key was not a requisite in VB6)
Not having a primary key usually means your data is poorly structured. However, it looks like you're dealing with summary/aggregate data there, so it's probably doesn't matter.
I am very new to database concepts and currently learning how to design a database. I have a table with below columns...
this is in mysql:
1. Names - text - unique but might change in future
2. Result - varchar - not unique
3. issues_id - int - not unique
4. comments - text - not unique
5. level - varchar - not unique
6. functionality - varchar - not unique
I cannot choose any of the above columns as primary keys as they might change in future. So i created a Auto-Increment id as names_id. I also have a GUI( a JTable) that shows this table and user updates Result,issues_id and comments based on the Names.Names here is a big text column. I cannot display names_id in the GUI as it does not make any sense in the GUI. Now when the user updates the database after giving inputs for column2,3,4 in the GUI i used the below query to update the database, i couldnt use names_id in where clause as the Jtable's row_id does not match with the names_id because not all the rows are loaded onto JTable.
update <tablename> set Result=<value>,issues_id=<value>,comments=<value>
where Names=<value>;
I could get the database updated but i want to know if its ok to update the database without even using the PK. how efficient is this? what purpose does the surrogate key serve here?
It is perfectly acceptable to update the database using a where condition that doesn't reference the primary key.
You may want to learn about indexes and constraints, though. You query could end up updating more than one row, if multiple rows have the same name. If you want to ensure that they are unique, then you can create a unique constraint on the column.
A primary key always creates an index on that column. This index makes access fast. If there is no index on name, then the update will need to scan the entire table to look at all names. You can make this faster by building an index on the field.
I've checked everything for errors: primary key, uniqueness, and type. Access just doesnt seem to be able to link the 2 fields i have in my database. can someone please take a look?
http://www.jpegtown.com/pictures/jf5WKxKRqehz.jpg
Thanks.
Your relationship diagram shows that you've made the ID fields your primary key in all your tables, but you're not using them for your joins. Thus, they serve absolutely no purpose. If you're not going to use "surrogate keys" (i.e., a meaningless ID number that is generated by the database and is unique to each record, but has absolutely no meaning in regard to the data in your table), then eliminate them. But if you're going to use "natural keys" (i.e., a primary key constructed from a set of real data fields that together are going to be unique for each record), you must have a unique compound index on those fields.
However, there are issues with both approaches:
Surrogate Keys: a surrogate PK makes each record unique. That is you could have a record for David Fenton with ID 1 and a record for David Fenton with ID 2. If it's the same David Fenton, you've got duplicate data, but as far as your database knows, they are unique.
Natural Keys: some types of entities work very well with natural keys. The best such are where there's a single field that identifies the record uniquely. An example would be "employee type," where values might be "associate, manager, etc." In that case, it's a very good candidate for using the natural key instead of adding a surrogate key. The only argument against the natural key in that case is if the data in the candidate natural key is highly volatile (i.e., it changes frequently). While every modern database engine provides "CASCADE UPDATE" functionality (i.e., if the value in the PK field changes, all the tables where that field is a Foreign Key are automatically updated), this imposes a certain amount of overhead and can be problematic. For single-column keys, it's unlikely to be an issue. Now, except for lookup tables, there are very few entities for which a natural key will be a single column. Instead, you have to create a compound index, i.e., an index that spans multiple data fields. In the index dialog in Access table design, you create a compound key by giving it a name in the first column, and then adding multiple rows in the second column (from the dropdown list of fields in your table). The drawback of this is that if any of the fields in your compound unique index are unknown, you won't get uniqueness. That is, if a field has a Null in two records, and the rest of the fields are identical, this won't be counted as a conflict of uniqueness because Null never equals Null. This is because Null doesn't mean "empty" -- it means "Unknown."
Allen Browne has explained everything you need to know about Nulls:
Nulls: Do I Need Them?
Common Errors with Null
In your graphic, you show that you are trying to link the Company table with the PManager table. The latter table has a CompanyID field, and your Company table has a unique index on its ID field, so all you need is a link from the ID field of the Company table to the CompanyID field of the PManager table. For your example to work (which would be useless, since you already have a unique index on the ID field), you'd need to create a unique compound key spanning both ID and ShortName in the Company table.
Additionally, if ShortName is a field that you want to be unique (i.e., you don't want two company records to have the same ShortName), you should add a unique index to it, whether or not you still use the ID field as your primary key. This brings me back to item #1 above, where I described a situation where a surrogate key could lead you to enter duplicate records, because uniqueness is established by the surrogate key along. Any time you choose to use a surrogate key, you must also add a unique compound index on any combination of data fields that needs to be unique (with the caveat about Null fields as outlined in item #2).
If you're thinking "surrogate keys mean more indexes" you're correct, in that you have two unique indexes on the same table (assuming you don't have the Null problem). But you do get substantial ease of use in joining tables in SQL, as well as substantially less duplication of data. Likewise, you avoid the overhead of CASCADE UPDATE. On the other hand, if you're viewing a child table with a natural foreign key, you don't need to join to the parent table to be able to identify the parent record, because the data that identifies that record is right there in the foreign-key fields. That lack of a need for a join can be a major performance gain in certain scenarios (especially for the case where you'd need an outer join because the foreign key can be Null).
This is actually quite a huge topic, and it's something of a religious argument. I'm firmly in the surrogate key camp, but I use natural keys for lookup tables where the key is a single column. I don't use natural keys for any other purpose. That said, where possible (i.e., no Null problems) I also have a unique index on the natural key.
Hope this helps.
Actually you need an index on the name fields, on both sides
However, may I suggest that you have way too many joins? In general there should only be one join from one table to the next. It is rare to have more than one join between tables, and exceedingly rare to have more than two.
Have a look at this link:
http://weblogs.asp.net/scottgu/archive/2006/07/12/Tip_2F00_Trick_3A00_-Online-Database-Schema-Samples-Library.aspx
Notice how all of the tables are joined together by a single relationship?
Each of the fields labeled PK are primary keys. These are AUTONUMBER fields. Each of the fields labeled FK are foreign keys. These are indexed Number fields of type Integer. The Primary Keys are connected to the Foreign Keys in a 1 to many relationship (in most cases).
99% of the time, you won't need any other kind of joins. The trick is to create tables with unique information. There is a lot of repeated information in your database.
A database that is reorganized in this manner is called a "normalized" database. There are lots of good examples of these at http://www.databaseanswers.org/data_models/
Just join on the CompanyID. You could also get rid of the Company field in PManager.
I did the following and the problem was solved (I face the same problem of referential integrity in access).
I exported data from both tables in Access to Excel. Table1
was containing Cust Code and basic information about the company.
Cust Code as Primary key.
Table2 was containing all information about who the
customers associated with that company.
I removed all duplicates from Table2 exported to excel.
Using Vlookup I checked and found that there are 11
customers code not present in Table1.
I added those codes in Access Table. I linked by
referential integrity and Problem was solved.
Also look for foreign key if it does not work.
You need to create an INDEX. Perhaps look for some kind of create index button and create an index on CompanyID