Use sphinx vs MySQL on no text search query - mysql

I have this doubt:
Suppose I have a one big table with a relationship to to a smaller table of users.
The idea is to search in that really big table for dates bigger than a given date and order by a score (big int, for example), and obtain related user info at the same time.
The result of this query can change every 10 minutes or so.
So, there is no text search, but I have a really big table. Should I use sphinx (or other search engine) or should I just use some MySQL indexes?
If I use sphinx, it's sure that I can obtain really fast results; but maybe having the index refreshed, even with delta indexing, doesn't make a big difference with MySQL indexing. At the same time, the changes in the table are not necessary new inserts, but updates; and I have read that real time indexing and delta index can give problems.
Maybe it would be better to use MySQL indexes, and help with some kind of caching to avoid unnecessary queries .

Just use MySQL, you definitely don't need Sphinx for what you are doing.

Related

How to disable reindexing table for each insert/update in MySQL?

I have log-type table in MySQL. There are indexes on 3 columns, because when doing some statistics out of that table, it obviously speeds up those statistic queries.
However, beeing a log-type table, where there is a lot of inserts but selects are very rare, it would make sense to disable reindexing the table with each insert. Is there a way how to tell MySQL not to automatically reindex the table and just leave indexes outdated and let them reindex them on-demand by us (somehow) ?
The only way right know which comes to my mind is to just create indexes before we run statistics queries and when those are done, just delete indexes. Or is there a better way ?
Creating an whole index for one query and then dropping it would be a waste of time. Creating the index would take at least as long as running the query without the help of an index.
By analogy, suppose you need to go to the store for some groceries, but it takes too long to walk there. So you walk further to the car dealership, buy a car, drive to the grocery store, then return the car. You could have just walked to the store in less time!
Besides, MySQL doesn't rebuild the whole index every time you insert. It only updates the existing index with a new value. Also, MySQL's storage engine is optimized to defer index updates and group them together for efficiency. You can read https://dev.mysql.com/doc/refman/8.0/en/innodb-change-buffer.html for details on that feature.
Before you decide on any optimization, you should measure to make sure the optimization is needed. I understand that inserting to a table with no indexes is slightly quicker than a table with indexes, but is that difference crucial in your situation? Is the insert fast enough to keep up with the traffic when you have indexes? You might be trying to solve a problem needlessly.

MySQL Large Datasets

I have large sets of data. Over 40GB that I loaded in MySQL table. I am trying to perform simple queries like select * from tablename but it takes gazillion minutes to run and eventually times out. If I set a limit, the execution is fairly fast ex: select * from tablename limit 1000.
The table has over 200 million records.
Tried creating indexes on some columns and that failed too after 3 hours of execution.
Any tips on working with these types of datasets?
First thing you need to do is completely ignore all answers and comments advising some other, awesome, mumbo jumbo technology. It's absolute bullshit. Those things can't work in a different way because they're all constrained with the same problem - hardware.
Now, let's get back to MySQL. The problem with LIMIT is that MySQL takes the whole data set, then takes LIMIT amount of rows starting from OFFSET. That means if you do SELECT * FROM my_table LIMIT 1000 - it will take all 200 million rows, buffer them, then it will start counting from 0 to 999 and discard the rest.
Yes, it takes time. Yes, it appears as dumb. However, MySQL doesn't know what "start" or "end" mean, so it can't know what limit and offset are until you tell it so.
To improve your search, you can use something like this (assuming you have numeric primary key):
SELECT * FROM tablename WHERE id < 10000 LIMIT 1000;
In this case, instead of with 200 million rows, MySQL will work with all rows whose PK is below 10 000. Much easier, much quicker, also readable. Numbers can be tweaked at any point and if you perform a pagination of some sort in a scripting language, you can always transfer the last numeric id that was present so MySQL can start from that id onwards in its search.
Also, you should be using InnoDB engine, and tweak it using innodb_buffer_pool_size which is the magic sauce that makes MySQL fly.
For large databases, one should consider using an alternative solutions such as Apache Spark. MySQL reads the data from disk which is a slow operation. Nothing can work as fast as a technology that is based on MapReduce. Take a look to this answer. It is true that with large databases, queries get very challenging.
Anyway assuming you want to stick with MySQL, first of all if you are using MyISAM, make sure to convert your database storage to InnoDB. This is especially important if you have lots of read/write operations.
It is also important to partition, that reduce the table into more manageable smaller tables. It will also enhance the indexes performance.
Do not be too generous with adding indexes. Define indexes wisely. If an index does not need to be UNIQUE do not define it as one. If an index does not need to include multiple fields do not include multiple fields.
Most importantly start monitor your MySQL instance. Use SHOW ENGINE INNODB STATUS to investigate the performance of your MySQL instance.

what is mysql indexing and how do you create an index?

Okay, mysql indexing. Is indexing nothing more than having a unique ID for each row that will be used in the WHERE clause?
When indexing a table does the process add any information to the table? For instance, another column or value somewhere.
Does indexing happen on the fly when retrieving values or are values placed into the table much like an insert or update function?
Any more information to clearly explain mysql indexing would be appreciated. And please dont just place a link to the mysql documentation, it is confusing and it is always better to get a personal response from a professional.
Lastly, why is indexing different from telling mysql to look for values between two values. For Example: WHERE create_time >= 'AweekAgo'
I'm asking because one of my tables is 220,000+ rows and it takes more than a minute to return values with a very simple mysql select statement and I'm hoping indexing will speed this up.
Thanks in advanced.
You were down voted because you didn't make effort to read or search for what you are asking for. A simple search in google could have shown you the benefits and drawbacks of Database Index. Here is a related question on StackOverflow. I am sure there are numerous questions like that.
To simplify the jargons, it would be easier to locate books in a library if you arrange the in shelves numbered according to their area of specialization. You can easily tell somebody to go to a specific location and pick the book - that is what index does
Another example: imagine an alphabetically ordered admission list. If your name start with Z, you will just skip A to Y and get to Z - faster? If otherwise, you will have to search and search and may not even find it if you didn't look carefully
A database index is a data structure that improves the speed of operations in a table. Indexes can be created using one or more columns, providing the basis for both rapid random lookups and efficient ordering of access to records.
You can create an index like this way :
CREATE INDEX index_name
ON table_name ( column1, column2,...);
You might be working on a more complex database, so it's good to remember a few simple rules.
Indexes slow down inserts and updates, so you want to use them carefully on columns that are FREQUENTLY updated.
Indexes speed up where clauses and order by.
For further detail, you can read :
http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html
http://www.tutorialspoint.com/mysql/mysql-indexes.htm
There are a lot of indexing, for example a hash, a trie, a spatial index. It depends on the value. Most likely it's a hash and a binary search tree. Nothing really fancy because most likely the fancy thing is expensive.

Are indexes good or bad for a large database?

I read on MySQL Performance Blog that when tables are large, it is better to scan full tables, instead of using indexes.
I have a table with tens of millions of rows. When conducting queries, if I use no indexes, then queries are 24 times slower than with indexes. I know lot of things may cause this (e.g., are rows stored sequentially), but can you please give me some hints what might be happening? Or how I should start examining this issue? I want to understand when use of indexes is preferred and when it's not
Thanks
The article says that when dealing with very large data sets, where the amount of rows you need to work with are approaching the number of rows that is in the table, using an index might hurt performance.
In this case, going through the index will indeed hurt performance, as long as you need more data than is present in the index.
To go through the index, the database engine first has to read large parts of the index table (it is a type of table), then for each row (or set of rows) from this result, go to the real table and start cherrypicking pages to read.
If, on the other hand, you only need to retrieve columns that area already part of the index table, then the database engine only has to read from that, and not continue on to the full table for more data.
If you end up reading most or close to most of the actual table in question, all the work required to deal with the index might be more overhead than just doing a full table-scan to begin with.
Now, this is all the article is saying. For most work dealing with a database, using indexes is the exact right thing to do.
For instance, if you need to extract a small set of rows, going through an index instead of a full table scan will be many order of magnitudes faster.
In any case, if you're in doubt, you should do some performance profiling to find out how your application behaves under different types of loads, and then start tweaking, don't take a single article as a silver bullet for anything.
For instance, one way to speed up the example queries that does a count on the pad column in the article, would be to create a single index that covered both val and pad, in this way, the count would simply be a index-scan, and not a index-scan + table-lookup, and would run faster than the full table-scan.
Your best option is to know your data, and to experiment, and to know how the tools you use work, so indeed, learn more about indexes, but in the end, it is you who decides what is best for your program.
As always, it depends. I've so far never ran into a scenario as described in that blog posts. Using indexes on my queries for large (50+ million rows) has been on the order of 100 to 10000 times faster than doing a full table scan on these big tables.
There's probably no silver bullet here, you have to test for your particular data and your particular queries.
It is good practice to put the index on each column which you used in a WHERE clause.

mysql performance

We are developing our database in MySql with innoDB engine. The database contains a column that is of varchar type with each entry containing about 3000 characters. We are to provide search on this column. For speeding up purpose, we need to add index on this column. Can you put in some information in this regard?
Which type of index do we need to put in to speed up the search? Do we need to take some other care about it for performance improvement?
If by search you mean you'll be performing a query like this:
SELECT * from cars WHERE car LIKE '%{search_str}%'
Then I am afraid that even if you add a key to the car column mysql will still have to perform a full-scan and your query might potentiolly be very slow.
If you are planning on supporting a significan amount of data to be searched and expect some high qps numbers, I would reccomend you have a look at Apaches Lucene project, which can drastically speed up any search query performed. Plus, it also supports full-text search.
Like ducky says, if you're going to query the column using a SQL LIKE, you're query is going to be very slow, no matter what index you put on the column.
There's 2 options:
Switch to MyIsam database instead of InnoDB and use full-text search on this column. This is done by placing a 'fulltext' index on the column. More Information.
Use a tool like Lucene for full-text searching