how to join part of two table, rather than whole - mysql

i was just wondering that in case we have two tables in Mysql and both have millions of records.
so we want to join part of two table. how can we accomplish that.
according to me if we use limit then it first queries all records and then limit the result.
will appreciate your suggestions. thanks
consider there are two tables.
user:(id in autoincrement)
id, name, email, gender
user_actions:(user_id is foreign key here)
id, user_id, action_id, action_date
we need to query actions for user but not all at a time, as the data is huge in the tables

You should add an index on the date field, and then use a where clause to say less than one date and greater than another. The index should allow you to not search through all records in the limit.

You can use a WHERE clause to limit the records used in the join. You'll have to provide more details about how you want to choose the part of the table if you want more details.

If you want to show the result to end user you can use a paging strategy to show the result page by page (50 rows per page for example).
You can use limit keyword of mysql to produce each page.
you can find the example here.

Related

Where to set IN criteria (Parent table or Child table) in mysql

We have two tables (Customer and Order) table. Is there any performance difference between these two queries given below.
Customer table has customer details.(customerId, customerdetails)
Order table has Order details of the customer(orderId, customerId, orderdetails) i.e customer id will be duplicated here and not nullable with ON_DELETE_CASCADE foreign key.
eg query:
select * from Order where customerId in (1,2,3,4,....)
or
select Order.* from Order inner join customer on
Order.customerId = customer.customerId where customer.customerId
in (1,2,3,4,....)
The first one does less work: it only references one table. Therefore it's faster.
The second one has an INNER JOIN. That means it must check each Order row to ensure it has a matching Customer row. (If a Customer is DELETEd, her orders won't appear in the result set.) Rows that don't match must not appear in the result set. Doing that check takes at least some work. But if your tables are small you'll probably be unable to measure any significant difference.
You can investigate this yourself by prefixing EXPLAIN to each query. It tells you how the query planner module will satisfy the query. Your second query will have two EXPLAIN rows.
Indexes will help. Your first query will benefit if you create this one:
ALTER TABLE Order CREATE INDEX customerId (customerId);
Welcome to Stack Overflow. When you have other query-optimization questions, you probably should read this.
A General Rule: One query involving two tables and a JOIN will be faster than two queries where you are handing ids from the first to the second. And it is less code (once you are familiar with Joins).
Why? A roundtrip from the client to the server takes some effort. Two roundtrips is slower than one.
Does it matter? Not a lot. Usually, the difference is millisecond(s), maybe less.
Always? No. That is why I said "General Rule".

Quickly selecting last n records from mysql database os size 700Mb

I'm developing an application wherein I've to display the last n records based on the user input. The queries run good when the records are less. But as the records grow it takes a lot of time to get the records, which in-turn is affecting my UI. So i wanna ask is that is there any better way to get the records quickly from DB. I already have a PK column & the table has 26 columns.
Thanks,
Karthick V
Use Indexing for your table fields for fetching data fast.
http://www.tutorialspoint.com/sql/sql-indexes.htm
https://msdn.microsoft.com/en-us/library/ms190457.aspx
http://www.sql-tutorial.com/sql-indexes-sql-tutorial/
If you only need the last records inserted from a single table, without heed to any other values, then ORDER BY <PRIMARYKEY> DESC LIMIT <N>
should do the trick, provided you make an auto increment primary key. Order by ... desc means descending order.
SELECT <columns> FROM <table> order by <idfield> desc limit <n>
If you have multiple tables being joined together, then alter table add index (<column>) on the columns used for joins to make the searching and joining process faster.

How to speed up the query search with the issue of GROUP BY?

I have the issue of using GROUP BY when select all the column from the table and in result with the poor performance in term of speed.
Select * from employee
group by customer_id;
The query above wouldn't be change,it is mandatory and fixed.It takes 17720ms is to long and the result must take shorter time, which is below 1 minute as my desired result.Since the table has many column and record, so it take much time in query searching.Is there any solution to solve this problem.Thanks.
For as simple as your query is, it appears almost pointless... You would not have duplicate employee IDs within an employee table, and doing a group by would still result in returning every row, every column.
However, that said, to optimize a GROUP BY, you would need an index on that column ... which I would think would already exist as the employee ID would probably be the primary key to the table.
Additionally, you don't have any aggregate columns what would warrant a group by. Are you instead just trying to LOOK for a specific employee? If so, that would be a different query using a WHERE clause for the criteria you are looking for.
FEEDBACK...
You updated your question and did a group by CUSTOMER ID (not employee ID). Ok, but what do you really mean to group by..
OR... Did you want to ORDER by a customer... In other words, I want a list of all employees, but want them sorted by the customer they are associated with... If this is the case, you would want something like...
select *
from employees
ORDER BY
customerID,
employeeLastName,
employeeFirstName
Without seeing your table structure(s), but if the employee table DOES have a column for the customer ID they are associated with, this query would put all employees for the same customer in a common PRE-SORT output by customer, then within that customer, sorted by the employees name (last, first).
If you have another table(s) with relationships between employees and customers, we would need to see that too to better offer an answer.
Column with heavy type LIKE BLOB, TEXT, NVARCHAR(200 or more) will slowdown your query by a lot if you have a lot of records. I suggest to check if it is really necessary to load them all from the start.
Also, you GROUP BY seem weird. What exactly are you trying to achieve with it?
The GROUP BY is not just weird, it is wrong. If you don't specify all the non-aggregate columns in the GROUP BY, you get seemingly random values for each column. Remove the GROUP BY or explain why you think you need it.
Or maybe the "*" is not correct. OK, you cannot show us your real column names, at least show us the real pattern to the SELECT, even if it has bogus column names.
I'm also confused as to why you call it a "search". There is no WHERE clause, which is where "search" criteria goes.

SQL ORDER BY performance

I have a table with more than 1 million records. The problem is that the query takes too much times, like 5 minutes. The "ORDER BY" is my problem, but i need the expression in the query order by to get most popular videos. And because of the expression i can't create an index on it.
How can i resolve this problem?
Thx.
SELECT DISTINCT
`v`.`id`,`v`.`url`, `v`.`title`, `v`.`hits`, `v`.`created`, ROUND((r.likes*100)/(r.likes+r.dislikes),0) AS `vote`
FROM
`videos` AS `v`
INNER JOIN
`votes` AS `r` ON v.id = r.id_video
ORDER BY
(v.hits+((r.likes-r.dislikes)*(r.likes-r.dislikes))/2*v.hits)/DATEDIFF(NOW(),v.created) DESC
Does the most popular have to be calculated everytime? I doubt if the answer is yes. Some operations will take a long time to run no matter how efficient your query is.
Also bear in mind you have 1 million now, you might have 10 million in the next few months. So the query might work now but not in a month, the solution needs to be scalable.
I would make a job to run every couple of hours to calculate and store this information on a different table. This might not be the answer you are looking for but I just had to say it.
What I have done in the past is to create a voting system based on Integers.
Nothing will outperform integers.
The voting system table has 2 Columns:
ProductID
VoteCount (INT)
The votecount stores all the votes that are submitted.
Like = +1
Unlike = -1
Create an Index in the vote table based on ID.
You have to alternatives to improve this:
1) create a new column with the needed value pre-calculated
1) create a second table that holds the videos primary key and the result of the calculation.
This could be a calculated column (in the firts case) or modify your app or add triggers that allow you to keep it in sync (you'd need to manually load it the firs time, and later let your program keep it updated)
If you use the second option your key could be composed of the finalRating plus the primary key of the videos table. This way your searches would be hugely improved.
Have you try moving you arithmetic of the order by into your select, and then order by the virtual column such as:
SELECT (col1+col2) AS a
FROM TABLE
ORDER BY a
Arithmetic on sort is expensive.

Whats wrong with this MYSQL query

I have the following SQL query , it seems to run ok , but i am concerned as my site grows it may not perform as expected ,I would like some feeback as to how effective and efficient this query really is:
select * from articles where category_id=XX AND city_id=XXX GROUP BY user_id ORDER BY created_date DESC LIMIT 10;
Basically what i am trying to achieve - is to get the newest articles by created_date limited to 10 , articles must only be selected if the following criteria are met :
City ID must equal the given value
Category ID must equal the given value
Only one article per user must be returned
Articles must be sorted by date and only the top 10 latest articles must be returned
You've got a GROUP BY clause which only contains one column, but you are pulling all the columns there are without aggregating them. Do you realise that the values returned for the columns not specified in GROUP BY and not aggregated are not guaranteed?
You are also referencing such a column in the ORDER BY clause. Since the values of that column aren't guaranteed, you have no guarantee what rows are going to be returned with subsequent invocations of this script even in the absence of changes to the underlying table.
So, I would at least change the ORDER BY clause to something like this:
ORDER BY MAX(created_date)
or this:
ORDER BY MIN(created_date)
some potential improvements (for best query performance):
make sure you have an index on all columns you querynote: check if you really need an index on all columns because this has a negative performance when the BD has to build the index. -> for more details take a look here: http://dev.mysql.com/doc/refman/5.1/en/optimization-indexes.html
SELECT * would select all columns of the table. SELECT only the ones you really require...