new data inserted in the middle rows accident - mysql

I just insert a data with a form in my website, normally the data will inserted in the last of rows like :
auto_increment name
1 a
2 b
3 c
4 d
5 e
but, when i insert a new data last time, it inserted in the middle rows of table, looked like :
17 data17
30 data30
18 data18
19 data19
20 data20
the newest data that has been inserted in the middle rows of table (data30).
it's happen to me rarerly (still happen) why this happen? and how i prevent this thing in in the future? thankyou.

What you see is the result returned by the engine. It is hardly a matter which recod is fetched early and which later as it depends on a lot of issues. For one, dont think your database table to be a sequential file like FoxPro. It is way more sophisticated than that. Next, for every query that returns data use a Order by clause to avoid these instances.
So always use:
select columns from table order by column
The above will ensure you get the data the way you need and not be surprised when the DB engine finds a later record in cache while fetches an older record from a slow media in another database file. If you read the basics of RDBMS concepts then these things are discussed as also you need to study how MySQL internally works.
I found this great article that discusses the many wonderful features of a modern database query engine.
http://thinkingmonster.wordpress.com/mysql/mysql-architecture/
Although the entire article discusses the topic very well but you may pay extra attention to the section that talks about Record Cache.

Related

Optimizing COUNT() on MariaDB for a Statistics Table

I've read a number of posts here and elsewhere about people wrestling to improve the performance of the MySQL/MariaDB COUNTfunction, but I haven't found a solution that quite fits what I am trying to do. I'm trying to produce a live updating list of read counts for a list of articles. Each time a visitor visits a page, a log table in the SQL database records the usual access log-type data (IP, browser, etc.). Of particular interest, I record the user's ID (uid) and I process the user agent tag to classify known spiders (uaType). The article itself is identified by the "paid" column. The goal is to produce a statistic that doesn't count the poster's own views of the page and doesn't include known spiders, either.
Here's the query I have:
"COUNT(*) FROM uninet_log WHERE paid='1942' AND uid != '1' AND uaType != 'Spider'"
This works nicely enough, but very slowly (approximately 1 sec.) when querying against a database with 4.2 million log entries. If I run the query several times during a particular run, it increases the runtime by about another second for each query. I know I could group by paid and then run a single query, but even then (which would require some reworking of my code, but could be done) I feel like 1 second for the query is still really slow and I'm worried about the implications when the server is under a load.
I've tried switching out COUNT(*) for COUNT(1) or COUNT(id) but that doesn't seem to make a difference.
Does anyone have a suggestion on how I might create a better, faster query that would accomplish this same goal? I've thought about having a background process regularly calculate the statistics and cache them, but I'd love to stick to live updating information if possible.
Thanks,
Tim
Add a boolean "summarized" column to your statistics table and making it part of a multicolumn index with paid.
Then have a background process that produces/updates rows containing the read count in a summary table (by article) and marks the statistics table rows as summarized. (Though the summary table could just be your article table.)
Then your live query reports the sum of the already summarized results and the as-yet-unsummarized statistics rows.
This also allows you to expire old statistics table rows without losing your read counts.
(All this assumes you already have an index on paid; if you don't, definitely add one and that will likely solve your problem for now, though in the long run likely you still want to be able to delete old statistics records.)

MySQL Queries Pegging Server Resources -- Indexes aren't being used

I took on a volunteer project a few years ago. The site is set up with Joomla, but most of the articles are rendered with php scripts that pull info from non-Joomla tables. The database is now almost 50MB and several of the non-Joomla tables have 60,000+ rows -- I had no idea it would get this big. Even just pulling up the list of the articles that contain these scripts takes a long time -- and right now there are only about 30 of them. I initially thought the problem was because I'm on dial-up, so everything is slow, but then we started getting "resources exceeded" notices, so I figured I better find out what's going on. It's not a high traffic site -- we get less than 2,000 unique visitors in any given month.
In one particular instance, I have one table where the library holdings (books, etc.) are listed by title, author, pub date, etc. The second table contains the names mentioned in those books. I have a Joomla! article for each publication that lists the names found in that book. I also have an article that lists all of the names from all of the books. That is the query below -- but even the ones for the specific books that pull up only 1,000 or so entries are very slow.
I originally set up indexes for these tables (MyISAM), but when I went back to check, they weren't there. So I thought re-configuring the indexes would solve the problem. Not even -- and according to EXPLAIN, they aren't even being used.
One of my problematic queries is as follows:
SELECT *
FROM pub_surnames
WHERE pub_surname_last REGEXP '^[A-B]'
ORDER BY pub_surname_last, pub_surname_first, pub_surname_middle
EXPLAIN gave:
id 1
select_type SIMPLE
table pub_surnames
type ALL
possible_keys NULL
key NULL
key_len NULL
ref NULL
rows 56422
Extra Using where; Using filesort
Also, phpmyadmin says "Current selection does not contain a unique column."
All of the fields are required for this query, but I read here that it would help if I listed them individually, so I did. The table contains a primary key, and I added a second unique index containing the primary key for the table, as well as the primary key for the table that holds the information about the publication itself. I also added an index for the ORDER BY fields. But I still get the same results when I use EXPLAIN and the performance isn't improved at all.
I set these tables up within the Joomla! database that the site uses for connection purposes and it makes it easier to back everything up. I'm wondering now if it would help if I used a separate database for our non-Joomla tables? Or would that just make it worse?
I'm not really sure where to go from here.
I think you are probably approaching this the wrong way. Probably it was the quick way to get it done when you first set it up, but now that the data has grown you are paying the price.
It sounds like you are recreating a massive list "inside" an article each time a page is rendered. Even though the source data is constantly being updated you would probably be better off storing the results. (Assuming I understand your data structure correctly.) Not knowing exactly what your php scripts are doing makes it a little complicated .. it could be that it would make more sense to actually make a very simple component to read the data from the other tables but I'll assume that doesn't make sense.
Here's what I think you might want to do.
Create a cron job (really easy to make a script using Joomla, go take a look at the jacs respository) and use it to run whatever your php is doing. You can schedule it once a day or once an hour or every 10 minutes, whatever makes sense.
Save the results. These could go into a data base table or you could cache them in the file system. Or both. Or possibly have the script update the articles since they seem to be fixed (you aren't adding new ones etc)
Then when the user comes you just want to either read the article if you stored there or you want to have a component that renders the results or make a plugin that will manage the queries for you. You should not be doing queries directly from inside an article layout, it's just wrong, even if no one knows it's there. If you have to run queries, use a content plugin similar to maybe the profile plugin, which does the queries in the right place architecturally.
Not knowing the exact purpose of what you are doing, it's hard to advise more, but I think if you are managing searches for people you'd likely be better off creating a way to use finder to index and search the results.
Check out below suggestions
Try changing your database engine to InnoDB which will work better for large datasets.
Also use RegEx alternative, which is used in "WHERE" part of query which hugely affects queries execution time.
Instead selecting all the columns with "*" just select needed columns.

DATABASE optimization insert and search

I was having an argument with a friend of mine. Suppose we have a db table with a userid and some other fields. This table might have a lot of rows. Let's suppose also that by design we limit the records for each userid in the table to about 50.My friend suggested that if I under every row for each userid one after another the lookup would be faster e.g
userid otherfield
1 .........
1 .........
.....until 50...
2 ........
etc. So when a user id 1 is created I pre-popopulate the 50 table's rows to with null values...etc. The idea is that if I know the amount of rows and find the first row with userid =1 I just have to look the next 49 an voila I don't have to search the whole table. Is this correct?can this be done without indexing? Is the pre-population an expensive process?Is there a performance difference if I just inserted in old fashioned way like
1 ........
2 ........
2 ........
1 ........
etc?
To answer a performance question like this, you should run performance tests on the different configurations.
But, let me make a few points.
First, although you might know that the records for a given id are located next to each other, the database does not know this. So, if you are searching for one user -- without an index -- then the engine needs to search through all the records (unless you have a limit clause in the query).
Second, if the data is fixed length (numeric and dates), the populating it with values after populating it with NULL values will occupy the same space on the page. But, if the data is variable length, then a given page will be filled with empty records. When you modify the records with real values, you will get page split.
What you are trying to do is to outsmart the database engine. This isn't necessary, because MySQL provides indexes, which provide almost all the benefits that you are describing.
Now, having said that, there is some performance benefit from having all the records for a user being co-located. If a user has 50 records, then reading the records with an index would typically require loading 50 pages into memory. If the records are co-located, then only one or two records would need to be read. Typically, this would be a very small performance gain, because most frequently accessed tables fit into memory. There might be some circumstances where the performance gain is worth it.

Is there an indexable way to store several bitfields in MySQL?

I have a MySQL table which needs to store several bitfields...
notification.id -- autonumber int
association.id -- BIT FIELD 1 -- stores one or more association ids (which are obtained from another table)
type.id -- BIT FIELD 2 -- stores one or more types that apply to this notification (again, obtained from another table)
notification.day_of_week -- BIT FIELD 3 -- stores one or more days of the week
notification.target -- where to send the notification -- data type is irrelevant, as we'll never index or sort on this field, but
will probably store an email address.
My users will be able to configure their notifications to trigger on one or more days, in one or more associations, for one or more types. I need a quick, indexable way to store this data.
Bit fields 1 and 2 can expand to have more values than they do presently. Currently 1 has values as high as 125, and 2 has values as high as 7, but both are expected to go higher.
Bit field 3 stores days of the week, and as such, will always have only 7 possible values.
I'll need to run a script frequently (every few minutes) that scans this table based on type, association, and day, to determine if a given notification should be sent. Queries need to be fast, and the simpler it is to add new data, the better. I'm not above using joins, subqueries, etc as needed, but I can't imagine these being faster.
One last requirement -- if I have 1000 different notifications stored in here, with 125 association possibilities, 7 types, and 7 days of the week, the combination of records is too high for my taste if just using integers, and storing multiple copies of the row, instead of using bit fields, so it seems like using bit fields is a requirement.
However, from what I've heard, if I wanted to select everything from a particular day of the week, say Tuesday (b0000100 in a bit field, perhaps), bit fields are not indexed such that I can do...
SELECT * FROM \`mydb\`.\`mytable\` WHERE \`notification.day_of_week\` & 4 = 4;
This, from my understanding, would not use an index at all.
Any suggestions on how I can do this, or something similar, in an indexable fashion?
(I work on a pretty standard LAMP stack, and I'm looking for specifics on how the MySQL indexing works on this or a similar alternative.)
Thanks!
There's no "good" way (that I know of) to accomplish what you want to.
Note that the BIT datatype is limited to a size of 64 bits.
For bits that can be statically defined, MySQL provides the SET datatype, which is in some ways the same as BIT, and in other ways it is different.
For days of the week, for example, you could define a column
dow SET('SUN','MON','TUE','WED','THU','FRI','SAT')
There's no builtin way (that I know of of getting the internal bit represntation back out, but you can add a 0 to the column, or cast to unsigned, to get a decimal representation.
SELECT dow+0, CONVERT(dow,UNSIGNED), dow, ...
1 1 SUN
2 2 MON
3 3 SUN,MON
4 4 TUE
5 5 SUN,TUE
6 6 MON,TUE
7 7 SUN,MON,TUE
It is possible for MySQL to use a "covering index" to satisfy a query with a predicate on a SET column, when the SET column is the leading column in the index. (i.e. EXPLAIN shows 'Using where; Using index') But MySQL may be performing a full scan of the index, rather than doing a range scan. (And there may be differences between the MyISAM engine and the InnoDB engine.)
SELECT id FROM notification WHERE FIND_IN_SET('SUN',dow)
SELECT id FROM notification WHERE (dow+0) MOD 2 = 1
BUT... this usage is non-standard, and can't really be recommended. For one thing, this behavior is not guaranteed, and MySQL may change this behavior in a future release.
I've done a bit more research on this, and realized there's no way to get the indexing to work as I outlined above. So, I've created an auxiliary table (somewhat like the WordPress meta table format) which stores entries for day of week, etc. I'll just join these tables as needed. Fortunately, I don't anticipate having more than ~10,000 entries at present, so it should join quickly enough.
I'm still interested in a better answer if anyone has one!

Having a column 'number_of_likes' or have a separate column...?

In my project, I need to *calculate 'number_of_likes' for a particular comment*.
Currently I have following structure of my comment_tbl table:
id user_id comment_details
1 10 Test1
2 5 Test2
3 7 Test3
4 8 Test4
5 3 Test5
And I have another table 'comment_likes_tbl' with following structure:
id comment_id user_id
1 1 1
2 2 5
3 2 7
4 1 3
5 3 5
The above one are sample data.
Question :
On my live server there are around 50K records. And I calculate the *number_of_likes to a particular comment by joining the above two tables*.
And I need to know Is it OK?
Or I should have one more field to the comment_tbl table to record the number_of_likes and increment it by 1 each time it is liked along with inserting it into the comment_likes_tbl....?
Doed it help me by anyway...?
Thanks In Advance.....
Yes, You should have one more field number_of_likes in the comment_tbl table. It will reduce the unnecessary joining of tables.
This way you don't need join until you need to get who liked the comment.
A good example you can see here is the database design of StackOverflow itself. See the Users Table they have a field Reputation with the Users table itself. Instead of Joining and calculating User's reputation every time they use this one.
You can take a few different approaches to something like this
As you're doing at the moment, run a JOIN query to return the collated results of comments and how many "likes" each has
As time goes on, you may find this is a drain on performance. Instead you could simply have a counter that increments attached to each comment field. But you may find it useful to also keep your *comment_likes_tbl* table, as this will be a permanent record of who liked what, and when (otherwise, you would just have a single figure with no additional metadata attached)
You could potentially also have a solution where you simply store your user's likes in the comment_likes_tbl, and then a cron task will run, on a pre-determined schedule, to automatically update all "like" counts across the board. Further down the line, with a busier site, this could potentially help even out performance, even if it does mean that "like" counts lag behind the real count slightly.
(on top of these, you can also implement caching solutions etc. to store temporary records of like values attached to comments, also MySQL has useful caching technology you can make use of)
But what you're doing just now is absolutely fine, although you should still make sure you've set up your indexes correctly, otherwise you will notice performance degradation more quickly. (a non-unique index on comment_id should suffice)
Use the query - as they are foreign keys the columns will be indexed and the query will be quick.
Yes, your architecture is good as it is and I would stick to it, for the moment.
Running too many joins can be a problem regarding performance, but as long as you don't have to face such problems, you shouldn't take care about it.
Even if you will ran into performance problems you should first,...
check if you use (foreign) keys, so that MySQL could lookup the data very fast
take advantage of MySQL Query cache
use some sort of 2nd caching layer, like memcached to store the value of likes (as this is only an incremental value).
The usage of memcache would solve your problem running too many joins and avoid to create a not really necessary column.