mysql query and index how to do it - mysql

i need hand to index a large table ! and i have no idea about index mysql tables
this is the query when i order rows from table
SELECT "posts.* AS `posts` , user.nickname AS nickname
FROM `posts`
LEFT JOIN user AS user ON (user.userid = posts.userid )
WHERE
posts.userid= '" . intval($bbinfo['userid']) . "'
ORDER BY posts.timestamp DESC
LIMIT $start , $_limit
"
how i can use index to index this table after inser a new post to the table ? or by alert the table where and when i can use index table and how ? please help

Just create the index and define the way it works. Then you have nothing to do. If the SQL storage engine think your index should be used he will use it. And when you create or update data it will be maintained.
Now the hard part is the definition of the index.
You can see an index as an order, like when you use a phone book. Your phone book is ordered by city, then by lastName and then by first name. It's an oreder stored near the table that the engine can use to find the results faster than it would be if he needs to read the whole table data.
In a phone book there is only one index, so the data is ordered on, that index. In a database you can have several indexes, so they are stored near the table and contains pointers to the real data addresses.
Indexes are very important when you search data. You can easily find people names Smith in New York. It's harder to find all the Smith in all US cities (with a phone book).
In your query you have two instructions that may benefits from an index. You are filtering by user and then ordering by timestamp.
If you create an index by user and then timestamp the engine will already have the solution of your query by simply reading the index.
So I would create this one:
CREATE index posts_user_and_timestamp_idx ON posts(userid, timestamp DESC);
And this index could be reused for all queries where you are simply filtering by users (like the phone book. You can easily extract pages about one city). But not for queries where the only filter is the timestamp (you would need an index on the timestamp only, hard to extract all smith on all cities from the phone book).
So in fact the main problem of index is that they heavily depends on the queries you are usually using on the database. If you are never using the same sort of queries on a table then you will need a lot of different indexes. And an index is something which takes a looot of place. Most tables are using 3 or 4 more physical space for indexes than for the data.

You should find a MySQL admin tool that works for you since schema changes to your dbs, including adding indexes are a very common task.
I use MySQL Workbench to do most of the schema manipulation, including setting indexes on tables. This is a free admin app for mySQL dbs. If you dont have it, download it.
http://dev.mysql.com/downloads/workbench/5.1.html
Open your db in Workbench, right click on the table to add the index to and choose Alter Table... Then click on indexes at the bottom of the window, you should see something similar to:
You can also use PHPMyAdmin, which is a little more complex and a little harder to instal, IMHO.
I drilled down into my Program Files directory (Windows XP) to find the PHPMyAdmin executable file - which launched the app.
From PHPMyAdmin 3.2.1 - open your schema. Click on the table - which presents you with a GUI menu that will allow you to easily specify an index using the icon with the lighting bolt to the right of the column to be indexed.

You only need to add an index once. No need to worry about doing anything after every INSERT. Based-on what you have in your post, I would try something like this:
CREATE INDEX posts_userid_idx ON posts(userid);
If that doesn't seem to work very well, I would then advise you to check the MySQL Documentation on CREATE INDEX and see if any of the available options would apply to your situation.
Based-on your (revised) comment, you should also add a PRIMARY KEY on postid, as well.
ALTER TABLE posts ADD PRIMARY KEY (postid);
And yes, you should be able to run both of those commands in MySQL Workbench as you would any other query.

Related

Can I optimize such a MySQL query without using an index?

A simplified version of my MySQL db looks like this:
Table books (ENGINE=MyISAM)
id <- KEY
publisher <- LONGTEXT
publisher_id <- INT <- This is a new field that is currently null for all records
Table publishers (ENGINE=MyISAM)
id <- KEY
name <- LONGTEXT
Currently books.publisher holds values that keep getting repeated, but that the publishers.name holds uniquely.
I want to get rid of books.publisher and instead populate the books.publisher_id field.
The straightforward SQL code that describes what I want done, is as follows:
UPDATE books
JOIN publishers ON books.publisher = publishers.name
SET books.publisher_id = publishers.id;
The problem is that I have a big number of records, and even though it works, it's taking forever.
Is there a faster solution than using something like this in advance?:
CREATE INDEX publisher ON books (publisher(20));
Your question title says ".. optimize ... query without using an index?"
What have you got against using an index?
You should always examine the execution plan if a query is running slowly. I would guess it's having to scan the publishers table for each row in order to find a match. It would make sense to have an index on publishers.name to speed the lookup of an id.
You can drop the index later but it wouldn't harm to leave it in, since you say the process will have to run for a while until other changes are made. I imagine the publishers table doesn't get update very frequently so performance of INSERT and UPDATE on the table should not be an issue.
There are a few problems here that might be helped by optimization.
First of all, a few thousand rows doesn't count as "big" ... that's "medium."
Second, in MySQL saying "I want to do this without indexes" is like saying "I want to drive my car to New York City, but my tires are flat and I don't want to pump them up. What's the best route to New York if I'm driving on my rims?"
Third, you're using a LONGTEXT item for your publisher. Is there some reason not to use a fully indexable datatype like VARCHAR(200)? If you do that your WHERE statement will run faster, index or none. Large scale library catalog systems limit the length of the publisher field, so your system can too.
Fourth, from one of your comments this looks like a routine data maintenance update, not a one time conversion. So you need to figure out how to avoid repeating the whole deal over and over. I am guessing here, but it looks like newly inserted rows in your books table have a publisher_id of zero, and your query updates that column to a valid value.
So here's what to do. First, put an index on tables.publisher_id.
Second, run this variant of your maintenance query:
UPDATE books
JOIN publishers ON books.publisher = publishers.name
SET books.publisher_id = publishers.id
WHERE books.publisher_id = 0
LIMIT 100;
This will limit your update to rows that haven't yet been updated. It will also update 100 rows at a time. In your weekly data-maintenance job, re-issue this query until MySQL announces that your query affected zero rows (look at mysqli::rows_affected or the equivalent in your php-to-mysql interface). That's a great way to monitor database update progress and keep your update operations from getting out of hand.
Your update query has invalid syntax but you can fix that later. The way to get it to run faster is to add a where clause so that you are only updating the necessary records.

Optimized SELECT query in MySQL

I have a very large number of rows in my table, table_1. Sometimes I just need to retrieve a particular row.
I assume, when I use SELECT query with WHERE clause, it loops through the very first row until it matches my requirement.
Is there any way to make the query jump to a particular row and then start from that row?
Example:
Suppose there are 50,000,000 rows and the id which I want to search for is 53750. What I need is: the search can start from 50000 so that it can save time for searching 49999 rows.
I don't know the exact term since I am not expert of SQL!
You need to create an index : http://dev.mysql.com/doc/refman/5.1/en/create-index.html
ALTER TABLE_1 ADD UNIQUE INDEX (ID);
The way I understand it, you want to select a row with id 53750. If you have a field named id you could do this:
SELECT * FROM table_1 WHERE id = 53750
Along with indexing the id field. That's the fastest way to do so. As far as I know.
ALTER table_1 ADD UNIQUE INDEX (<collumn>)
Would be a great first step if it has not been generated automatically. You can also use:
EXPLAIN <your query here>
To see which kind of query works best in this case. Note that if you want to change the where statement (anywhere in the future) but see a returning value in there it will be a good idea to put an index on that aswell.
Create an index on the column you want to do the SELECT on:
CREATE INDEX index_1 ON table_1 (id);
Then, select the row just like you would before.
But also, please read up on databases, database design and optimization. Your question is full of false assumptions. Don't just copy and paste our answers verbatim. Get educated!
There are several things to know about optimizing select queries like Range and Where clause Optimization, the documentation is pretty informative baout this issue, read the section: Optimizing SELECT Statements. Creating an index on the column you evaluate is very helpfull regarding performance too.
One possible solution You can create View then query from view. here is details of creating view and obtain data from view
http://www.w3schools.com/sql/sql_view.asp
now you just split that huge number of rows into many view (i. e row 1-10000 in one view then 10001-20000 another view )
then query from view.
I am pretty sure that any SQL database with a little respect for themselves does not start looping from the first row to get the desired row. But I am also not sure how they makes it work, so I can't give an exact answer.
You could check out what's in your WHERE-clause and how the table is indexed. Do you have a proper primary key? Like using a numeric data type for that. Do you have indexes on more columns, that is used in your queries?
There is also alot to concider when installing the database server, like where to put the data and log files, how much memory to give the server and setting the growth. There's a lot you can do to tune your server.
You could try and split your tables in partitions
More about alter tables to add partitions
Selecting from a specific partition
In your case you could create a partition on ID for every 50.000 rows and when you want to skip the first 50.000 you just select from partition 2. How to do this ies explained quite well in the MySQL documentation.
You may try simple as this one.
query = "SELECT * FROM tblname LIMIT 50000,0
i just tried it with phpmyadmin. WHERE the "50,000" is the starting row to look up.
EDIT :
But if i we're you i wouldn't use this one, because it will lapses the 1 - 49999 records to search.

How to merge 2 Records in innoDB MySQL databases

This is related to How to change ID in mysql
I also have checked other questions and none are quite like this one.
As we know, innodb has a feature. If I want to channge an id of a record for example, then all other table that point to the previous ID will magically be updated.
What about if I want to MERGE 2 records?
Say I have 2 businesses.
They have 2 ID.
I want to merge them into one. I also want to use innodb awesome feature to automatically change things.
I can't just change one of the id to the other ID. Or can I?
What would you do to merge 2 simmilar records in database?
Of course what actually goes into the combined record will be business decisions.
Basically I just do not want to pin point all the other table one by one. I think on update rule is there for a reason. Is there a way where I just change slaveID to masterID, keep ALL data in master the same, and then have the database itself (rather than my program) to repoint all tables that point to slaveID to point to masterID? of course, records for slaveID will be gone anyway.
For example, with normal mysql engine, you can change ID, and then you have to go through all table that points to the old ID to point the new ID instead. With innodb, that repointing is done by the database engine itself. Which is kind of cool. Why would anyone use non innodb engine anyway.
I want to do the same but for merging.
Trying to set a records primary key to an already existing value will simply result in a key violation error. While this is simple on a first glance, it has a side effect: You can not use ON UPDATE CASCADE to merge two records - it will simply not work.
If you have the possibility to change the schema, you can use the old but good redirect-trick:
(Assuming your IDs are positive, maybe unsigend ints)
add a field redirect int not null default 0
Create a view:
.
CREATE VIEW tablename_view
SELECT
-- repeat next line for every field apart from redirect
IF(s.redirect>0,m.<fieldname>,s.<fieldname>
FROM tablename AS s
LEFT JOIN tablename AS m ON s.redirect=m.id
When you merge a record (slave) into another record (master) run UPDATE tablename SET redirect=<id_of_master> WHERE id=<id_of_slave>
Adapt your select queries to select from tablename_view instead of tablename
Create and use a maintenance script to weed out merger slaves

Using Sphinx for the first time - configuring the sql_query key

I'm currently practicing using Sphinx, I've not far off done much, except the configuration what I'm trying to do. The sql_query key is leaving me somewhat confused what to put there, I read in the Sphinx documentation of sql_query but it doesn't seem to clear my mind from knowing what to do since I have many SELECTs in my web application, and I want to use Sphinx for my search and the SQL is often changed (upon user search filtering).
As of my search using MySQL, I want to integrate Sphinx to my web application, if the sql_key is not optional, do I have to expect to put the whole search SQL query into that field or do I pick out the necessary fields from tables to start a reindex?
Can someone point me to the right direction so I can get things going well with Sphinx and my web application.
sql_query is mandatory , it's run by sphinx to get the data you want to be indexed from mysql . You can have joins , conditions etc. , must be a valid sql query . You should have something like "SELECT id ,field1,field2,fieldx from table" . id must be a primary id .Each row returned by this query is considered a document ( which is returned by sphinx when you search ) .
If you have multiple tables ( that are very different by meaning - users , articles etc.) - you need to create an index for each .
Read tutorials from here : http://sphinxsearch.com/info/articles/ to understand how sphinx works .
You can create a sql query to get union set of records from the Database. If you do multiple table joining and query to select the best result set, you can do it with Sphinx too.
You may run into a few trouble with your existing table structure in the database.
Like :
Base table does not have integer primary key field
Create a new table which has two fields. One for the integer id field and the other field to hold the primary key of the base table. Do an inner join with that table and select the id field from that table.
Eg. SELECT t1.id, t2.name, t2.description, t2.content FROM table_new t1 INNER JOIN table_2 t2 WHERE t1.document_id = t1.thread_id INNER JOIN REST_OF_YOUR_SELECT_QUERY
The ta.id is for Sphinx search engine to do its internal indexing.
You filter data by placing WHERE clause and filtering
You can do that in Sphinx by setting filters dynamically based on the conditions.
You select and join different tables to get results
This also can be done by setting different sources and indexes based on your requirements.
Hope this would help you to get an understanding what you need to add and modify to start thinking how Sphinx search engine can be configured to your requirements. Just come here again if your need more help.

adding data to interrelated tables..easier way?

I am a bit rusty with mysql and trying to jump in again..So sorry if this is too easy of a question.
I basically created a data model that has a table called "Master" with required fields of a name and an IDcode and a then a "Details" table with a foreign key of IDcode.
Now here's where its getting tricky..I am entering:
INSERT INTO Details (Name, UpdateDate) Values (name, updateDate)
I get an error: saying IDcode on details doesn't have a default value..so I add one then it complains that Field 'Master_IDcode' doesn't have a default value
It all makes sense but I'm wondering if there's any easy way to do what I am trying to do. I want to add data into details and if no IDcode exists, I want to add an entry into the master table. The problem is I have to first add the name to the fund Master..wait for a unique ID to be generated(for IDcode) then figure that out and add it to my query when I enter the master data. As you can imagine the queries are going to probably get quite long since I have many tables.
Is there an easier way? where everytime I add something it searches by name if a foreign key exists and if not it adds it on all the tables that its linked to? Is there a standard way people do this? I can't imagine with all the complex databases out there people have not figured out a more easier way.
Sorry if this question doesn't make sense. I can add more information if needed.
p.s. this maybe a different question but I have heard of Django for python and that it helps creates queries..would it help my situation?
Thanks so much in advance :-)
(decided to expand on the comments above and put it into an answer)
I suggest creating a set of staging tables in your database (one for each data set/file).
Then use LOAD DATA INFILE (or insert the rows in batches) into those staging tables.
Make sure you drop indexes before the load, and re-create what you need after the data is loaded.
You can then make a single pass over the staging table to create the missing master records. For example, let's say that one of your staging table contains a country code that should be used as a masterID. You could add the master record by doing something along the lines of:
insert
into master_table(country_code)
select distinct s.country_code
from staging_table s
left join master_table m on(s.country_code = m.country_code)
where m.country_code is null;
Then you can proceed and insert the rows into the "real" tables, knowing that all detail rows references a valid master record.
If you need to get reference information along with the data (such as translating some code) you can do this with a simple join. Also, if you want to filter rows by some other table this is now also very easy.
insert
into real_table_x(
key
,colA
,colB
,colC
,computed_column_not_present_in_staging_table
,understandableCode
)
select x.key
,x.colA
,x.colB
,x.colC
,(x.colA + x.colB) / x.colC
,c.understandableCode
from staging_table_x x
join code_translation c on(x.strange_code = c.strange_code);
This approach is a very efficient one and it scales very nicely. Variations of the above are commonly used in the ETL part of data warehouses to load massive amounts of data.
One caveat with MySQL is that it doesn't support hash joins, which is a join mechanism very suitable to fully join two tables. MySQL uses nested loops instead, which mean that you need to index the join columns very carefully.
InnoDB tables with their clustering feature on the primary key can help to make this a bit more efficient.
One last point. When you have the staging data inside the database, it is easy to add some analysis of the data and put aside "bad" rows in a separate table. You can then inspect the data using SQL instead of wading through csv files in yuor editor.
I don't think there's one-step way to do this.
What I do is issue a
INSERT IGNORE (..) values (..)
to the master table, wich will either create the row if it doesn't exist, or do nothing, and then issue a
SELECT id FROM master where someUniqueAttribute = ..
The other option would be stored procedures/triggers, but they are still pretty new in MySQL and I doubt wether this would help performance.