MySQL CONCAT '*' symbol toasts the database - mysql

I have a table used for lookups which stores the human-readable value in one column and a the same text stripped of special characters and spaces in another. e.g., the value "Children's Shows" would appear in the lookup column as "childrens-shows".
Unfortunately the corresponding main table isn't quite that simple - for historical reasons I didn't create myself and now would be difficult to undo, the lookup value is actually stored with surrounding asterisks, e.g. '*childrens-shows*'.
So, while trying to join the lookup table sans-asterisks with the main table that has asterisks, I figured CONCAT would help me add them on-the-fly, e.g.;
SELECT *
FROM main_table m
INNER JOIN lookup_table l
ON l.value = CONCAT('*',m.value,'*')
... and then the table was toast. Not sure if I created an infinite loop or really screwed the data, but it required an ISP backup to get the table responding again. I suspect it's because the '*' symbol is probably reserved, like a wildcard, and I've asked the database to do the equivalent of licking its own elbow. Either way, I'm hesitant to 'experiment' to find the answer given the spectacular way it managed to kill the database.
Thanks in advance to anyone who can (a) tell me what the above actually did to the database, and (b) how I should actually join the tables?

When using CONCAT, mysql won't use the index. Use EXPLAIN to check this, but a recent problem I had was that on a large table, the indexed column was there, but the key was not used. This should not bork the whole table however, just make it slow. Possibly it ran out of memory, started to swap and then crashed halfway, but you'd need to check the logs to find out.
However, the root cause is clearly bad table design and that's where the solution lies. Any answer you get that allows you to work around this can only be temporary at best.
Best solution is to move this data into a separate table. 'Childrens shows' sounds like a category and therefore repeated data in many rows. This should really be an id for a 'categories' table, which would prevent the DB from having to run CONCAT on every single row in the table, as you could do this:
SELECT *
FROM main_table m
INNER JOIN lookup_table l
ON l.value = m.value
/* and optionally */
INNER JOIN categories cat
ON l.value = cat.id
WHERE cat.name = 'whatever'
I know this is not something you may be able to do given the information you supplied in your question, but really the reason for not being able to make such a change to a badly normalised DB is more important than the code here. Without either the resources or political backing to do things the right way, you will end up with even more headaches like this, which will end up costing more in the long term. Time for a word with the boss perhaps :)

Related

Inner joining within an inner join

I tried to find if there are any answered but couldn't seem to find any. I'm trying to join together four tables but one of the joins is not on the table that the other two joins are from, I've successfully joined three of the table I'm just not sure of syntax for joining the third.
SELECT * FROM
nc_booking
INNER JOIN
nc_customer ON nc_booking.c_id = nc_customer.id
INNER JOIN
nc_propertys ON nc_booking.p_id = nc_propertys.id
How would i now join nc_propertys to another table nc_owner?
Building on the code from #GordonLinoff, to add your extra table you need to do something like:
SELECT *
FROM nc_booking b INNER JOIN
nc_customer c
ON b.c_id = c.id INNER JOIN
nc_propertys p
ON b.p_id = p.id INNER JOIN
nc_owner o
ON o.id = p.o_id;
You haven't shared the column names we need to use to connect the extra table, so the last line might not be right. A few things to note ...
(1) The SELECT * is not ideal. If you only need particular columns here, list them. I've stuck with your * because I don't know what you want from the tables. Where a column with the same name exists in each table, you'll have "fully qualify" the field name as follows ...
SELECT c.id as customer_id,
-- more field can go here, with a comma after each
...
Several of the joined tables have an id field, so the c. is necessary to tell the database which one we want. Notice that as with the tables, we can also give the fields an 'alias', which in this case is 'customer_id'. This can be very helpful for presentation, and is often essential when using the output from a query as part of a larger piece of code.
(2) Since all the joins are INNER JOINS it makes little (if any) difference what order the tables are listed as long as the connections between them remain the same.
(3) For MySQL, it technically shouldn't matter whether you have lots of new-lines or none at all. SQL is designed to ignore "white space" (except within data). What matters is simply laying out your code so it is easy to read ... especially for other users who later might need to figure out what you were doing (although in my experience also for you, when you return to a piece of code several years later and can't remember it at all).
(4) In each ON clause it doesn't actually matter whether you wright say a = b or b = a. That's because you aren't setting one to equal the other, you are requiring that they already be equal so it amounts to the same thing either way.
My advice to a SQL beginner would be when you are writing a SELECT query (which only reads and doesn't change any data): if you aren't too sure then write some code and set it to run. If it's completely invalid, your software should give you some idea of what is wrong and no harm will be done. If it's valid but wrong, the very worst that can happen is that you put some unnecessary load on your database server ... if it takes a long time to run and you weren't expecting it to, then you should be able to cancel the query. As long as you have some idea of what you expect the results to look like, and roughly how many rows to expect, you won't go too far wrong. If you get completely stuck come back here to Stack Overflow.
Things get a bit different if you are writing code which DELETEs or UPDATEs data. Then you want to know exactly what you're up to. Normally you can write a closely related SELECT statement first to make sure you're going to be making all and only the changes you were expecting. It's also best to make sure you've got a way to undo your changes should the worst happen. Backups are obviously good, and you can often create your own backup copy of a table before you make any alterations. You don't necessarily need to rely on backup software or your in house IT guys for that ... in my experience they don't like databases anyway.
Also there are some great books out there. For a beginner, I'd recommend anything by Ben Forta, including his SQL in 10 Minutes (that's a per chapter figure), or his MySQL Crash Course (the latter is a little old though, so won't have anything on the more recently added features of MySQL).
Your syntax looks okay. I am providing an answer because you really should learn to use table aliases. They make a query easier to write and to read:
SELECT *
FROM nc_booking b INNER JOIN
nc_customer c
ON b.c_id = c.id INNER JOIN
nc_propertys p
ON b.p_id = p.id;

Right way to phrase MySQL query across many (possible empty) tables

I'm trying to do what I think is a set of simple set operations on a database table: several intersections and one union. But I don't seem to be able to express that in a simple way.
I have a MySQL table called Moment, which has many millions of rows. (It happens to be a time-series table but that doesn't impact on my problem here; however, these data have a column 'source' and a column 'time', both indexed.) Queries to pull data out of this table are created dynamically (coming in from an API), and ultimately boil down to a small pile of temporary tables indicating which 'source's we care about, and maybe the 'time' ranges we care about.
Let's say we're looking for
(source in Temp1) AND (
((source in Temp2) AND (time > '2017-01-01')) OR
((source in Temp3) AND (time > '2016-11-15'))
)
Just for excitement, let's say Temp2 is empty --- that part of the API request was valid but happened to include 'no actual sources'.
If I then do
SELECT m.* from Moment as m,Temp1,Temp2,Temp3
WHERE (m.source = Temp1.source) AND (
((m.source = Temp2.source) AND (m.time > '2017-01-01')) OR
((m.source = Temp3.source) AND (m.time > '2016-11'15'))
)
... I get a heaping mound of nothing, because the empty Temp2 gives an empty Cartesian product before we get to the WHERE clause.
Okay, I can do
SELECT m.* from Moment as m
LEFT JOIN Temp1 on m.source=Temp1.source
LEFT JOIN Temp2 on m.source=Temp2.source
LEFT JOIN Temp3 on m.source=Temp3.source
WHERE (m.source = Temp1.source) AND (
((m.source = Temp2.source) AND (m.time > '2017-01-01')) OR
((m.source = Temp3.source) AND (m.time > '2016-11-15'))
)
... but this takes >70ms even on my relatively small development database.
If I manually eliminate the empty table,
SELECT m.* from Moment as m,Temp1,Temp3
WHERE (m.source = Temp1.source) AND (
((m.source = Temp3.source) AND (m.time > '2016-11-15'))
)
... it finishes in 10ms. That's the kind of time I'd expect.
I've also tried putting a single unmatchable row in the empty table and doing SELECT DISTINCT, and it splits the difference at ~40ms. Seems an odd solution though.
This really feels like I'm just conceptualizing the query wrong, that I'm asking the database to do more work than it needs to. What is the Right Way to ask the database this question?
Thanks!
--UPDATE--
I did some actual benchmarks on my actual database, and came up with some really unexpected results.
For the scenario above, all tables indexed on the columns being compared, with an empty table,
doing it with left joins took 3.5 minutes (!!!)
doing it without joins (just 'FROM...WHERE') and adding a null row to the empty table, took 3.5 seconds
even more striking, when there wasn't an empty table, but rather ~1000 rows in each of the temporary tables,
doing the whole thing in one query took 28 minutes (!!!!!), but,
doing each of the three AND clauses separately and then doing the final combination in the code took less than a second.
I still feel I'm expressing the query in some foolish way, since again, all I'm trying to do is one set union (OR) and a few set intersections. It really seems like the DB is making this gigantic Cartesian product when it seriously doesn't need to. All in all, as pointed out in the answer below, keeping some of the intelligence up in the code seems to be the better approach here.
There are various ways to tackle the problem. Needless to say it depends on
how many queries are sent to the database,
the amount of data you are processing in a time interval,
how the database backend is configured to manage it.
For your use case, a little more information would be helpful. The optimization of your query by using CASE/COUNT(*) or CASE/LIMIT combinations in queries to sort out empty tables would be one option. However, if-like queries cost more time.
You could split the SQL code to downgrade the scaling of the problem from 1*N^x to y*N^z, where z should be smaller than x.
You said that an API is involved, maybe you are able handle the temporary "no data" tables differently or even don't store them?
Another option would be to enable query caching:
https://dev.mysql.com/doc/refman/5.5/en/query-cache-configuration.html

MSQL alter table with JOIN

I am trying to update a table with a column from another table. I dont want to view the join, I want to alter the table.
However, this is faiing:
UPDATE
a_dataset
SET
a_dataset.lang_flag = b_dataset.language
FROM
a_dataset
INNER JOIN
b_dataset
ON
a_dataset.ID = b_dataset.ID
However, I keep getting a syntax error, and cannot locate what I am missing?
I am guessing that you mean to update your records when you say alter the table. If so, you can simply rewrite your update statement with join like this:
UPDATE a_dataset a
JOIN b_dataset b ON a.ID = b.ID
SET a.lang_flag = b.[LANGUAGE]
As Uueerdo and myself said: Starting table names with numbers is a bad[TM] idea. The same is for letters, which you now chose to use. a is no better than 1 in this regard. Also calling tables just "dataset" isn't really helpful either. What is the table storing? Users? Then call it users. Articles on a news web site? Then call it articles. And so on. Everything in a database is dataset, no need to tell that anyone.
I guess you're new to SQL, am I right? Because another issue is: Unless you're going to drop table b_dataset after this command, you're probably doing something you're not supposed to do in relational data bases. The whole idea is to store all data only once. If you can automagically copy the column from b to a, then you could also select join if from a and b when you need it instead of copying it.
For learning SQL (or anything else), Stack Overflow is probably a bad place (it's good for asking questions in the process, though), so I recommend that you go get someone who has some experience in SQL to teach you, or get some book / tutorial on SQL. From first glance, this seems to be a good on-line book: http://sql.learncodethehardway.org/ - but I didn't read it.

How to efficiently design MySQL database for my particular case

I am developing a forum in PHP MySQL. I want to make my forum as efficient as I can.
I have made these two tables
tbl_threads
tbl_comments
Now, the problems is that there is a like and dislike button under the each comment. I have to store the user_name which has clicked the Like or Dislike Button with the comment_id. I have made a column user_likes and a column user_dislikes in tbl_comments to store the comma separated user_names. But on this forum, I have read that this is not an efficient way. I have been advised to create a third table to store the Likes and Dislikes and to comply my database design with 1NF.
But the problem is, If I make a third table tbl_user_opinion and make two fields like this
1. comment_id
2. type (like or dislike)
So, will I have to run as many sql queries as there are comments on my page to get the like and dislike data for each comment. Will it not inefficient. I think there is some confusion on my part here. Can some one clarify this.
You have a Relational Scheme like this:
There are two ways to solve this. The first one, the "clean" one is to build your "like" table, and do "count(*)'s" on the appropriate column.
The second one would be to store in each comment a counter, indicating how many up's and down's have been there.
If you want to check, if a specific user has voted on the comment, you only have to check one entry, wich you can easily handle as own query and merge them two outside of your database (for this use a query resulting in comment_id and kind of the vote the user has done in a specific thread.)
Your approach with a comma-seperated-list is not quite performant, due you cannot parse it without higher intelligence, or a huge amount of parsing strings. If you have a database - use it!
("One Information - One Dataset"!)
The comma-separate list violates the principle of atomicity, and therefore the 1NF. You'll have hard time maintaining referential integrity and, for the most part, querying as well.
Here is one way to do it in a normalized fashion:
This is very clustering-friendly: it groups up-votes belonging to the same comment physically close together (ditto for down-votes), making the following query rather efficient:
SELECT
COMMENT.COMMENT_ID,
<other COMMENT fields>,
COUNT(DISTINCT UP_VOTE.USER_ID) - COUNT(DISTINCT DOWN_VOTE.USER_ID) SCORE
FROM COMMENT
LEFT JOIN UP_VOTE
ON COMMENT.COMMENT_ID = UP_VOTE.COMMENT_ID
LEFT JOIN DOWN_VOTE
ON COMMENT.COMMENT_ID = DOWN_VOTE.COMMENT_ID
WHERE
COMMENT.COMMENT_ID = <whatever>
GROUP BY
COMMENT.COMMENT_ID,
<other COMMENT fields>;
[SQL Fiddle]
Please measure on realistic amounts of data if that works fast enough for you. If not, then denormalize the model and cache the total score in the COMMENT table, and keep it current it through triggers every time a new row is inserted to or deleted from *_VOTE tables.
If you also need to get which comments a particular user voted on, you'll need indexes on *_VOTE {USER_ID, COMMENT_ID}, i.e. the reverse of the primary/clustering key above.1
1 This is one of the reasons why I didn't go with just one VOTE table containing an additional field that can be either 1 (for up-vote) or -1 (for down-vote): it's less efficient to cover with secondary indexes.

Is it necessary or beneficial to use a "stuff.whatever" naming convention in MySQL?

I'm working on a MySQL database for a social network site I'm building, and so far it's been a great learning experience. However, there has been one thing in particular that's always confused me.
When seeking answers to a particular issue, I see so many examples that use dots in their naming conventions in their MySQL queries. For example:
SELECT c.id, c.comment, c.user_id, u.username, u.photo
FROM (comments c)
JOIN users u ON c.user_id = u.id
WHERE c.topic_id = 9
and here is another example:
SELECT fb.auto_id, fb.user_id, fb.bulletin, fb.subject, fb.color, fb.submit_date, fru.disp_name, fru.pic_url
FROM friend_bulletin AS fb
LEFT JOIN friend_friend AS ff ON fb.user_id = ff.userid
LEFT JOIN friend_reg_user AS fru ON fb.user_id = fru.auto_id
WHERE (
ff.friendid =1
AND ff.status =1
)
LIMIT 0 , 30
Is there a particular benefit to using the dots in the names? As someone who comes from doing a lot of CSS work, at first glance the dots appear to me as some kind of association between different things, but what are they for here?
I suppose I just want to make sure I'm not making my database structure/queries less efficient by not using this 'dot' naming convention. If someone could explain to me in laymen's terms what they are used for, I'd really appreciate it.
stuff.whatever should be thought of as table_name.column_name. You're explicitly associating each column reference with the table it belongs to which, IMHO, is a best practice to follow.
You shouldn't think of the dots as being part of a "naming convention." The functionality is more similar to calling an attribute on an object.
In the case of stuff.whatever 'stuff'
represents the database table and
whatever represents the data in a column called 'whatever' in the database.
If you've seen a column referenced without the table portion, it is because the user is expecting mysql to figure out which column they mean.
If there is only one table in the query with a column of that name, mysql can do it, no problem.
But, for example, you have a "facebook" table and a "twitter" table and you join them through a query because they both have a "user_id" column or something, and they BOTH have an "avatar_image" column and you didn't specify the table, mysql would raise en error telling you it didn't know exactly what you were asking for.
There is nothing wrong with using full name conventions, however, if you have long table, field names, it is easier to use alias for readability purpose.