phpmyadmin is this Index Size? - mysql

Good day, newbie on php here.
I use phpmyadmin mqysql, my problem is i don't know what should i put in the encircle field shown in the picture below (also know what is this and how to use it)
I proceeded not giving any value on it and it happens whenever i make a primary key or unique key on a table i created. Is this what they call index size? i tried searching this on internet and see other tutorials but i don't see any mentions on this(maybe im googling it wrong?).
So what does this do?
what value should i put here?
what is the default value of this?
when using unique, what do veterans put on index name when selecting unique?
i hope you could enlighten or teach it to me because its quite vague now that im self studying it, thanks :)

That with the index size is very simple. imagine that you create an index on a VARCHAR (16) column. In this case the index entry is created with 16 characters. Now it can be that the strings already differ in the first characters. In such a case, the length of the index can e.g. 8 set.
This makes the index shorter, uses less memory and is therefore faster. If there are several entries in the column that are the same in the first 8 characters, all these rows are found via the index and the comparison which row really fits is then made by comparing the individual rows. So if the number of entries found is very high, the whole thing will be slower.
check how many equal entries in the table with a shorter index
+----+-----+---------------------------------------------------+
| id | rev | content |
+----+-----+---------------------------------------------------+
| 2 | 1 | One hundred angels can dance on the head of a pin |
| 1 | 1 | The earth is flat |
| 3 | 2 | The earth is flat and rests on a bull's horn |
| 5 | 5 | The earth is flat type |
| 4 | 3 | X The earth is like a ball. |
+----+-----+---------------------------------------------------+
SELECT d.*,count(*) as cnt
FROM docs as d
GROUP BY SUBSTRING(d.content,1,8)
ORDER BY cnt DESC;
+----+-----+---------------------------------------------------+-----+
| id | rev | content | cnt |
+----+-----+---------------------------------------------------+-----+
| 1 | 1 | The earth is flat | 3 |
| 2 | 1 | One hundred angels can dance on the head of a pin | 1 |
| 4 | 3 | X The earth is like a ball. | 1 |
+----+-----+---------------------------------------------------+-----+
3 rows in set (0.00 sec)

Related

how to use where caluse on primary key? [duplicate]

This question already has an answer here:
Query Distinct values from a multi-valued column
(1 answer)
Closed 5 years ago.
i'm developing a quiz website. In my database, I need a table which shows
reported quiz errors. It should look like this:
______________________________________________________________________
|key| quiz_number | who_reported_this_error | reported_number |
-----------------------------------------------------------------------
| 1 | 5 | goid482,saiai10,hahakaka | 3 |
-----------------------------------------------------------------------
| 2 | 3 | fiiai55,kihogi84 | 1 |
-----------------------------------------------------------------------
If a user named hanabi reported an error about quiz number 5,
first I need to check the who_reported_this_error column because
I don't want for a user to report same error twice. If the user 'hanabi' doesn't exist in "who_reported_this_error" column I should update row 1.
Now for my problem. I want to find a row which I should update with a key column, and the key column's number should automatically increased. But I know that I can't use a WHERE clause on this primary key. Hhow can I solve this problem?
The problem is with the table schema. NEVER store comma-separated data in a single column. You should structure the table to look more like this:
____________________________________________
|key| quiz_number | who_reported_this_error |
────────────────────────────────────────────
| 1 | 5 | goid482 |
---------------------------------------------
| 2 | 3 | fiiai55 |
---------------------------------------------
| 3 | 5 | saiai10 |
---------------------------------------------
| 4 | 5 | hahakaka |
---------------------------------------------
| 5 | 3 | kihogi84 |
--------------------------------------------
You might also want a timestamp column on this table. Then, put a UNIQUE constraint on the quiz_number and who_reported_this_error columns to prevent the duplicates.
If you later need to see everyone who reported errors for quiz 5 in the same record, use MySql's group_concat() function to build that information on the fly. Just don't store the data that way.
The key column has nothing to do with this question. You certainly can use your primary key in a WHERE clause. It just won't help you in this case because that data isn't relevant to the problem at hand.

Find the ranking for a row with multiple values separated by a comma in mysql

I have a database in mysql which has three rows, these rows has concatenated multiples values(values separated by a comma) already in it. I want to strike the rank using find_in_set function or any better function to get the positions.
Table
id | NUMBERS |
1 | 30,40,10 |
2 | 58,29,21 |
3 | 18,25,51 |
I want to rank each row in this format
id | NUMBERS | POSITION |
1 | 30,40,10 | 2,1,3 |
2 | 58,29,21 | 1,2,3 |
3 | 18,25,51 | 3,2,1 |
I Know the data representation and structure is wrong, but the data i have currently is made like the above and has a lot of data in it, meaning changing the structure would take me a lot of time, although I would change it later.
I need a workaround idea as to how to do this. I would be grateful for your support thanks.

How can I implement a viewed system for my website's posts?

Here is my current structure:
// posts
+----+--------+----------+-----------+------------+
| id | title | content | author_id | date_time |
+----+--------+----------+-----------+------------+
| 1 | title1 | content1 | 435 | 1468111492 |
| 2 | title2 | content2 | 657 | 1468113910 |
| 3 | title3 | content3 | 712 | 1468113791 |
+----+--------+----------+-----------+------------+
// viewed
+----+---------------+---------+------------+
| id | user_id_or_ip | post_id | date_tiem |
+----+---------------+---------+------------+
| 1 | 324 | 1 | 1468111493 |
| 2 | 546 | 3 | 1468111661 |
| 3 | 135.54.12.1 | 1 | 1468111691 |
| 5 | 75 | 1 | 1468112342 |
| 6 | 56.26.32.1 | 2 | 1468113190 |
| 7 | 56.26.32.1 | 3 | 1468113194 |
| 5 | 75 | 2 | 1468112612 |
+----+---------------+---------+------------+
Here is my query:
SELECT p.*,
(SELECT count(*) FROM viewed WHERE post_id = :id) AS total_viewed
FROM posts p
WHERE id = :id
Currently I've faced with a huge date for viewed table. Well what's wrong with my table structure (or database design)? In other word how can I improve it?
A website like stackoverflow has almost 12 million posts. Each post has (on average) 500 viewed. So the number of viewed's rows should be:
12000000 * 500 = 6,000,000,000 rows
Hah :-) .. Honestly I cannot even read that number (btw that number will grow up per sec). Well how stackoverflow handles the number of viewed for each post? Will it always calculate count(*) from viewed per post showing?
You are not likely to need partitioning, redis, nosql, etc, until you have many millions of rows. Meanwhile, let's see what we can do with what you do have.
Let's start by dissecting your query. I see WHERE id=... but no LIMIT or ORDER BY. Let's add to your table
INDEX(id, timestamp)
and use
WHERE id = :id
ORDER BY timestamp DESC
LIMIT 10
Any index is sorted by what is indexed. That is the 10 rows you are looking for are adjacent to each other. Even if the data is pushed out of cached, there will probably be only one block to provide those 10 rows.
But a "row" in a secondary index in InnoDB does not contain the data to satisfy SELECT *. The index "row" contains a pointer to the actual 'data' row. So, there will be 10 lookups to get them.
As for view count, let's implement that a different way:
CREATE TABLE ViewCounts (
post_id ...,
ct MEDIUMINT UNSIGNED NOT NULL,
PRIMARY KEY post_id
) ENGINE=InnoDB;
Now, given a post_id, it is very efficient to drill down the BTree to find the count. JOINing this table to the other, we get the individual counts with another 10 lookups.
So, you say, "why not put them in the same table"? The reason is that ViewCounts is changing so frequently that those actions will clash with other activity on Postings. It is better to keep them separate.
Even though we hit a couple dozen blocks, that is not bad compared to scanning millions of rows. And, this kind of data is somewhat "cacheable". Recent postings are more frequently accessed. Popular users are more frequently accessed. So, 100GB of data can be adequately cached in 10GB of RAM. Scaling is all about "counting the disk hits".

Table design for a dictionary that can have words with many different spellings

I'm working on a small, personal dictionary database in Microsoft Access (the 2013 version). There are a lot of words in English that have two or even more spellings. Realistically speaking though, there are not that many words with three, let alone, four spellings. Nevertheless, they do exist. Examples include aerie/aery/eyrie/eyry (a word with four spellings) and ketchup/catsup/catchup (a word with three spellings). Not to mention that English is literally rife with words that have two spellings. Everybody knows that (the differences between the English and British spelling systems come immediately to mind). So, I need to design my tables in such a way that there are no significant flaws with the design. I'm going to explain step by step what the database should look like and introduce the problems I have found with my current design along the way. So, here we go.
All words, obviously, should be stored in the same table. And I'm not going to include irrelevant aspects of the design such as other columns that might be part of the table (in reality, the database is much more complex). Let's focus on the most important parts. Here's what the Words table with some pre-filled sample data will look like:
+---------+-----------+
| word_id | word |
+---------+-----------+
| 1 | ketchup |
| 2 | catsup |
| 3 | catchup |
| 4 | moneyed |
| 5 | monied |
| 6 | delicious |
+---------+-----------+
To keep track of a group of words that are the same, but just have different spellings, it is probably wise to choose one of them as the main word and the other ones as its child words. Here's the diagram to show you how I envision that (here, ketchup and moneyed are main words, all the others child words):
All this information will be placed in a new table which we shall call the Alternative Spellings table (The columns word_id and alt_spell_word_id are going to be part of the table's compound primary key):
+---------+-------------------+
| word_id | alt_spell_word_id |
+---------+-------------------+
| 1 | 2 |
| 1 | 3 |
| 4 | 5 |
+---------+-------------------+
Here's how all this looks in Access's Relationships panel (notice that I have enforced referential integrity between the word_id column of the Words table and the word_id column of the Alternative Spellings table and checked off the Cascade Delete Related Records option):
Although straight-forwardly simple, that's the only design I've been able to come up with so far. And I think that will basically do it. This is as simple as it gets. The problem with this design, however, is threefold:
1: This is not a serious problem, but I'd still like to hear your thoughts anyway. Every time I'm making a lookup of a word to see it in the Word Details form, I have to go through the entire Alternative Spellings table to see if it has other spellings associated with it or if it is a child word. So, I'd have to search both the word_id and alt_spell_word_id columns. And this process will be talking place for each and every word in the database every time I want to check the details of it. One possible solution is in the Words table to create an additional Boolean column that will keep track of whether a word has alternative spellings. This will indicate if we should scan the Alternative Spellings table at all when opening it up in the Word Details form. Here's what this would look like:
+---------+-----------+------------------+
| word_id | word | has_alt_spelling |
+---------+-----------+------------------+
| 101 | ketchup | yes |
| 102 | catsup | no |
| 103 | catchup | no |
| 104 | moneyed | yes |
| 105 | monied | no |
| 106 | delicious | no |
+---------+-----------+------------------+
I think that's a good design, but, as I said, I'd very much like to hear what you've got to say about this: a problem/not a problem? Your solution?
2: The other problem, which is of more serious nature, has to do with primary keys. word_id and alt_spell_word_id should be part of a compound primary key, of course. We don't want duplicate rows in the table. We all understand that. Not a problem. But here's what happens when we try to enforce referential integrity between the Words table and Alternative Spellings table (see the screenshot above). Everything is fine except that now we can associate a word with the id of a nonexistent word and the database is not going to complain because, for example, the last record in word_id has 4 in it, which is true, we do have a record with the id of 4 in the Words table, but there is no way to impose any kind of constraint on the alt_spell_word_id column. We can put any kind of nonsense in there:
+---------+-------------------+
| word_id | alt_spell_word_id |
+---------+-------------------+
| 1 | 2 |
| 1 | 3 |
| 4 | 5 |
| 4 | 34564 |
+---------+-------------------+
I think that breaks the referential integrity of the database schema and thus is a serious problem. What kind of solution would you like to offer?
3: Another problem with this design is that if we want to delete a certain word from the Words table, the deletion will cascade through the Alternative Spellings table and delete all related records there, which is perfectly fine, but here's the catch: since we agreed that different words in the database can actually be just one word with different spellings, they all should be deleted along with the main word. But that's not going to happen as things stand at the moment. For instance, if I were to delete ketchup in the Words table, all related records in the Alternative Spellings table would be deleted. Fine. But we'd really get two dangling records, catchup and catsup—they can't exist on their own because they are part of the group where ketchup is the main word, but now it has been deleted:
+---------+-----------+
| word_id | word |
+---------+-----------+
| 2 | catsup |
| 3 | catchup |
| 4 | moneyed |
| 5 | monied |
| 6 | delicious |
+---------+-----------+
+---------+-------------------+
| word_id | alt_spell_word_id |
+---------+-------------------+
| 4 | 5 |
+---------+-------------------+
Here's the actual database (simplified version) if you want to play with it.
Thank you all in advance.
1) For 1, if you add indexes to the database, it is probably not a big concern (since your look-ups of a word then joining to get the alternate words will be fast). However, if a child word can have only one parent, then you do not need an additional table:
The word table can just be:
+---------+-----------+------------------+
| word_id | word | parent_word_id |
+---------+-----------+------------------+
| 101 | ketchup | |
| 102 | catsup | 101 |
| 103 | catchup | 101 |
| 104 | moneyed | |
| 105 | monied | 104 |
| 106 | delicious | |
+---------+-----------+------------------+
A query for a word and its children would then be:
select wordGroup.word
from word w join word wordGroup on
(w.word_id = wordGroup.parent_word_id
or wordGroup.word_id = w.word_id)
where w.word = {your_word};
A query for a word and associated words regardless of whether it was the child word or not would be:
select wordGroup.word
from word w join word wordGroup on
(w.word_id = wordGroup.parent_word_id
or wordGroup.word_id = w.word_id)
where wordGroup.word_id = {your_word};
2 The right way of doing this is to place a foreign key constraint (referential constraint) on the tables. In my example for 1, the parent_word_id would have a referential constraint back to word(word_id). For your example, alt_spell_word_id would have a referential constraint back to the word table and the word_id. You could then place a unique constraint on the combination of word_id and alt_spell_id. See (on Access constraints): https://msdn.microsoft.com/en-us/library/bb177889(v=office.12).aspx
3 I think deletion of a primary word has a meaning problem in your design. What does it mean to delete the primary word and keep the grouping? In theory, you would have to do a series of operations: 1-decide on a new primary word; 2-delete the old one. This would be true of almost any design including a primary word.
Another option, is to not have a primary word but to have groups. This alters the db design from a one-to-many relationship between primary word and other words to a many-to-many between words. In this case, deletion is easy because you just cascade all associations to the word out of the word_groups table.
The resulting tables would be:
word:
+---------+-----------+
| word_id | word |
+---------+-----------+
| 101 | ketchup |
| 102 | catsup |
| 103 | catchup |
| 104 | moneyed |
| 105 | monied |
| 106 | delicious |
+---------+-----------+
word_groups:
+---------+-----------+
| word_id |sibling_word_id
+---------+-----------+
| 101 | 102 |
| 101 | 103 |
| 102 | 101 |
| 102 | 103 |
| 103 | 101 |
| 103 | 102 |
| 104 | 105 |
| 105 | 104 |
+---------+-----------+
Foreign key constraints protect referential integrity while indexes will make look-ups fast.
I think that I would use a model in which another table defines word_spelling_groups, so that for every word that can mean the same as "ketchup", there is an entry in this table with the same value of word_spelling_group as "ketchup"s value of word_spelling_group.
An advantage of this would be that a word can be a member of multiple spelling groups, in case it had alternative spellings only in the context of a particular meaning (I struggle for an example).

Optimizing ENORMOUS MySQL View [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
Does MySQL view always do full table scan?
Running SELECT * FROM vAtom LIMIT 10 never returns (I aborted it after 48 hours);
explain select * from vAtom limit 10 :
+----+-------------+---------------+--------+-------------------------------------------+---------------+---------+------------------------------------------------------------------------------------------------+-----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------+--------+-------------------------------------------+---------------+---------+------------------------------------------------------------------------------------------------+-----------+---------------------------------+
| 1 | SIMPLE | A | ALL | primary_index,atom_site_i_3,atom_site_i_4 | NULL | NULL | NULL | 571294166 | Using temporary; Using filesort |
| 1 | SIMPLE | S | ref | primary_index | primary_index | 12 | PDB.A.Structure_ID | 1 | Using index |
| 1 | SIMPLE | C | eq_ref | PRIMARY,chain_i_1,sid_type,detailed_type | PRIMARY | 24 | PDB.A.Structure_ID,PDB.A.auth_asym_id | 1 | Using where |
| 1 | SIMPLE | AT | eq_ref | primary_index | primary_index | 24 | PDB.A.Structure_ID,PDB.A.type_symbol | 1 | Using index |
| 1 | SIMPLE | entityResidue | ref | PRIMARY | PRIMARY | 52 | PDB.S.Structure_ID,PDB.A.label_entity_id,PDB.A.label_seq_id,PDB.A.label_comp_id,PDB.C.Chain_ID | 1 | Using where; Using index |
| 1 | SIMPLE | E | ref | primary_index | primary_index | 12 | PDB.AT.Structure_ID | 1 | Using where |
+----+-------------+---------------+--------+-------------------------------------------+---------------+---------+------------------------------------------------------------------------------------------------+-----------+---------------------------------+
6 rows in set (0.00 sec)
You don't have to tell me that 600M rows is a lot. What I want to know is why it's slow when I only want 10 rows, and what can I do from here.
I'll be glad to post show create for anything per requests (don't want to make this post 7 pages long)
Tables can have a built-in sort order, this default kicks in on any query where you don't specify your own sorting. So your query is still trying to sort those 570+ million rows so it can find the first 10.
I'm not really surprised. Consider the case where you are simply joining 2 tables A and B and are limiting the result set; it may be that only the last N rows from table A have matching, then the database would have to go through all the rows in 'A' to get the N matching rows.
This would unavoidably be the case if there are lots of rows in 'B'.
You'd like to think that it would work the other way around when there are only a few rows in B - but obviously that's not the case here. Indeed, IIRC LIMIT has no influence on the generation of a query plan - even if it did, mysql does not seem to cope with push-predicates for views.
If you provided details of the underlying tables, the number of rows in each and the view it should be possible to write a query referencing the tables directly which runs a lot more efficiently. Alternatively depending on how the view is used, you may be able to get the desired behaviour using hints.
It claims to be using a filesort. The view must have an ORDER BY or DISTINCT on an unindexed value, or the index is not specific enough to help.
To fix it, either change the view so that it does not need to sort, or change the underlying tables so that they have an index that will make the sort fast.
I think show create would be useful. It looks like you have a full table scan on vAtom. Maybe if you put an ORDER BY clause, after an indexed field, it would perform better.