I have this database: http://sqlfiddle.com/#!9/9c1f66/16
and I would like to produce an output like this for a certain DataEntry (in this case the one with id 1):
RowId | field1 | field2 | field3 | field4
1 | value_field1 | value_field2 | value_field3 | value_field4 |
I tried using pivot tables but I can't figure out how to do it properly.
The SQL language has a very strict and unbreakable rule requiring you to know the number of columns returned by a query before looking at any data in your tables.
The only way around this is with dynamic SQL, where you complete the query over three steps:
Run a query to find out information about your target columns.
Use the information from the first step to build a complex new SQL statement on the fly, using a PIVOT, conditional aggregations, numerous JOINs to the same table, or some combination thereof.
Run the query from step 2.
The design you are pursuing will require you to jump through those hoops for pretty much every query you will want to run. This will make application maintenance and development much more complex, it will make executing the query itself slow, and possibly worst of all this will break your ability to effectively index your data.
Related
While creating a notification system I ran across a question. The community the system is created for is rather big, and I have 2 ideas for my SQL tables:
Make one table which includes :
comments table:
id(AUTO_INCREMENT) | comment(text) | viewers_id(int) | date(datetime)
In this option, the comments are stored with a date and all users that viewed the comment divided with ",". For example:
1| Hi I'm a penguin|1,2,3,4|24.06.1879
The system should now use the column viewers_id to decide if it should show a notification or not.
make two tables like:
comments table:
id(AUTO_INCREMENT) | comment(text) | date(datetime)
viewer table:
id(AUTO_INCREMENT) | comment_id | viewers_id(int)
example:
5|I'm a rock|23.08.1778
1|5|1,2,3,4
In this example we check the viewers_id again.
Which of these is likely to have better performance?
In my opinion you shouldn't focus that much on optimizing your tables, since its far more rewarding to optimize your application first.
Now to your question:
Increasing the Performance of an SQL Table can be achieved in 2 ways:
1. Normalize as for every SQL Table i would recommend you to normalize it:
Wikipedia - normalization 2. you can reduce concurrency that means reducing the amount of times when data can't be accessed because it gets changed.
as for your example: if i had to pick one of those i would pick the second option.
I have three tables. My news have on or several categories.
News
-------------------------
| id | title | created
Category
-------------------------
| id | title
News_Category
-------------------------
| news_id | category_id
But i have many rows on News about 10,000,000 rows. Using joind for fetch data will be performance issue.
Select title from News_Category left join News on (News_Category.news_id = News.id)
group by News_Category.id order by News.created desc limit 10
I want to have best query for this issue. For many to many relation data in huge tables which query have better performance.
Please give me the best query for this use case.
The best performance for that query, is given by permanently store it. This is you need a materialized view.
On MySQL you can implement the materialized view by create a table.
this is
create table FooMaterializedView as
(select foo1.*, foo2.* from foo1 join foo2 on ( ... ) where ... order by ...);
and now depending on how often the source tables change (this is receive inserts, updates or deletes) and how much you need to use the latest version of the query you need to implement suitable view maintenance strategy.
This is, depending of your needs and the problem itself perform:
full computation (i.e. truncate the materialized view and generate it again from scratch) might be enough
incremental computation. If it is too costly to the system perform a full computation very often, you must capture only the changes on the source tables and update the materialized view according to the changes.
If you need to take the incremental approach, I can only wish you the best luck. I can point you that you can use triggers to capture the changes on the source tables, and you will need to either use an algorithmic or an equalization approach to compute the changes to make to the materialized view.
I'm trying to make it quick and easy to perform a keyword search on a set of MySQL tables which are linked to each other.
There's a table of items with a unique "itemID" and associated data is spread out amongst other tables, all linked to via the itemID.
I've created a view which concatenates much of this information into one usable form. This makes searching really easy, but hasn't helped with performance. It's my first use of a view, and perhaps wasn't the right use. If anyone could give me some pointers I'd be very grateful.
A simplified example is:
ITEMS TABLE:
itemID | name
------ -------
1 "James"
2 "Bob"
3 "Mary"
KEYWORDS TABLE:
keywordID | itemID | keyword
------ ------- -------
1 2 "rabbit"
2 2 "dog"
3 3 "chicken"
plus many more relations...
MY VIEW: (created using CONCAT_WS, GROUP_CONCAT and a fair few JOINs)
itemID | important_search_terms
------ -------
1 "James ..."
2 "Bob, rabbit, dog ..."
3 "Mary, chicken ..."
I can then search the view for "mary" and "chicken" and easily find that itemID=3 matches. Brilliant!
The problem is, it seems to be doing all the work of the CONCATs and JOINs for each and every search which is not efficient. With my current test data searches are taking approx 2 seconds, which is not practical.
I was hoping that the view would be cached in some way, but perhaps I'm not using it in the right way.
I could have an actual table with this search info which I update periodically, but it doesn't seem as neat as I had hoped.
If anyone has any suggestions I'd be very grateful. Many Thanks
Well, a view is nothing more than making it easier to read what you query for but underneath perform the SQL-Statement lying underneath everytime.
So no wonder it is as slow (even slower...) as when you run that statement itself.
Usually this is done by indexing jobs (running at nighttime, not annoying anyone), or indexed inserts (when new data is inserted, checks run if it is a good idea to insert them into the indexed interesting words).
Having that at runtime is really hard and require well designed database structures and most of the time potent hardware for the sql server (depending of data amount).
A MySQL view is not the same as a materialized view in other SQL languages. All it's really doing is caching the query itself, not the data needed for the query.
The primary use for a MySQL view is to eliminate repetitive queries that you have to write over and over again.
You've made it easy, but not made it quick. I think if you look at the EXPLAIN for your query you are going to see that MySQL is materializing that view (writing out a copy of the result set from the view query as a "derived table") each time you run the query, and then running a query from that "derived table".
You would get better performance if you can have the "search" predicate run against each table separately, something like this:
SELECT 'items' AS source, itemID, name AS found_term
FROM items WHERE name LIKE 'foo'
UNION ALL
SELECT 'keywords', itemID, keyword
FROM keywords WHERE keyword LIKE 'foo'
UNION ALL
SELECT 'others', itemID
FROM others WHERE other LIKE 'foo'
-or-
if you don't care what the matched term is, or which table it was found in, and you just want to return a distinct list of itemID that were matched
SELECT itemID
FROM items WHERE name LIKE 'foo'
UNION
SELECT itemID
FROM keywords WHERE keyword LIKE 'foo'
UNION
SELECT itemID
FROM others WHERE other LIKE 'foo'
I am trying to do a 'fuzzy' search on a sql table with names of people:
This is the table:
+----+------------+
| T1 | T2 |
+----+------------+
| 1 | Last,First |
| 2 | Last,First |
| 3 | Last,First |
+----+------------+
I want to get a select statement where I query T2 with the LIKE operator such that it works even when your query is "First Last"
The only way I can think of is by splitting the values and concatenating them and then searching again for the entry. Is there a better way to do this?
Yes, if there's a possibility you may put both last, first and first last into the database, the better way is to design your schema properly.
If you ever find yourself trying to search on, or otherwise manipulate, parts of columns, your schema is almost certainly broken. It will almost certainly kill performance.
The correct way is to have the table thus:
T1 FirstName LastName
== ========= ========
1 Pax Diablo
2 Bob Smith
3 George Jones
Then you can more efficiently split the user-entered name (once, before running the query) rather than trying to split every single name in the database.
In the case where the database always holds last, first, it may not actually be necessary for a schema change.
The problem you have in that case is simply one of interpreting what the user entered.
One possibility, although it is a performance killer, is to do a like for each separate word. So, if the user entered pax diablo, your resultant query might be:
select T1 from mytable
where T2 like '%pax%'
and T2 like '%diablo%'
That way, you don't care about the order so much.
However, given my dislike of slow queries, I'd try to steer clear of that unless absolutely necessary (or your database is relatively small and likely to stay that way).
There are all sorts of ways to speed up these sorts of queries, such as:
using whatever full-text search capabilities your DBMS has.
emulating such abilities by extracting and storing words during insert/update triggers (and removing them during delete triggers).
that previous case, but also ensuring extra columns used with lower-cased values of the current column (for speed).
telling the user they need to use the last, first form for searching.
trying to avoid the %something% search string as much as possible (with something%, indexes can still be used).
my previously mentioned "split th name into two columns" method.
You can try this one. But better yet reconstruct your schema of your table by separating the names.
SELECT *
FROM myTable
WHERE T2 LIKE CONCAT('%', 'First', '%','Last', '%') OR
T2 LIKE CONCAT('%', 'Last', '%','First', '%')
Currently I'm running these two queries:
SELECT COUNT(*) FROM `mytable`
SELECT * FROM `mytable` WHERE `id`=123
I'm wondering what format will be the most efficient. Does the order the queries are executed make a difference? Is there a single query that will do what I want?
Both queries are fairly unrelated. The COUNT doesn't use any indexes, while the SELECT likely uses the primary key for a fast look-up. The only thing the queries have in common is the table.
Since these are so simple, the query optimizer and results cache shouldn't have a problem performing very well on these queries.
Are they causing you performance problems? If not, don't bother optimizing them.
Does the order the queries are executed make a difference?
No, they reach for different things. The count will read a field that contains the number of colums of the table, the select by id will use the index. Both are fast and simple.
Is there a single query that will do what I want?
Yes, but it will make your code less clear, less maintenable (due to mixing concepts) and in the best case will not improve the performance (probably it will make it worse).
If you really really want to group them somehow, create a stored procedure, but unless you use this pair of queries a lot or in several places of the code, it can be an overkill.
First of: Ben S. makes a good point. This is not worth optimizing.
But if one wants to put those two statements in one SQl statement I think this is one way to do it:
select *,count(*) from mytable
union all
select *,-1 from mytable where id = 123
This will give one row for the count(*) (where one ignores all but the last column) and as many rows where id = 123 (where one ignores the last column as it is always -1)
Like this:
| Column1 | Column2 | Column3 | ColumnN | Count(*) Column |
---------------------------------------------------------------
| ignore | ignore | ignore | ignore | 4711 |
|valid data|valid data|valid data|valid data| -1 (ignore) |
Regards
Sigersted
What table engine are you using?
select count(*) is better on MyISAM compare to InnoDB. In MyISAM the number of rows for each table is stored. When doing count(*) the value is return. InnoDB doesn't do this because it supports transactions.
More info:
http://www.mysqlperformanceblog.com/2006/12/01/count-for-innodb-tables/