Is it possible to set column value as a wildcard? MySQL - mysql

This is a super strange question, and it usefulness it's probably limited to my problem; I'm going to explain what I'm asking and why I need it.
My problem:
I have a table, let's say with 2 columns, serve the next table as example:
id|value
1 A
2 B
3 C
4 A
5 A
Now, If I do a "SELECT id WHERE value = 'A', I would get 3 results, 1, 4, 5. If I do a "SELECT id WHERE value = 'B', I would get 1 result, 2. And so on, if there were more entries, I would get the corresponding numbers of rows as my result according the value I'm looking in my query. It's all good.
But now, here comes my problem. Let's say I want to get every row for every query, but with the next restriction:
Do not modify the queries.
If I do "SELECT id WHERE value = 'A'", I would get every id, if I do "SELECT id WHERE value = 'B'", I would get every id, and so on.
"But if I can't modify my query, then what can I do?" You may ask, well, you can modify the table, like changing the value of the column 'value' to a value that would match every value, that's a wildcard, hence the title of the question, but I'm pretty sure if I update all 'value' values to '%', it doesn't work (I tried knowing this wouldn't work, but still, I couldn't lose anything trying).
So, you can do whatever you want, the only restriction is to not modify the queries.
I know this is kind of the inverse of how databases and tables should work, but this is a problem I've been presented with, maybe this is impossible, but maybe it's not.
Edit:
I know this makes little to no sense at all, but I'm asking this as a kind of challenge, appealing to the creatives minds out there. Don't worry about vulnerabilities or anything else, just ask yourselves: "How would I do it?"

Before I present any solutions, let me make it clear that you are solving the wrong problem. You should be figuring out how to change your queries; that restriction will continue to generate more problems. Any solution to this problem will be so complex it will generate more problems.
Hopefully this really is just an intellectual exercise.
I'm also going to only give sketches on how to do this, because this is just an intellectual exercise RIGHT?!
The first, and most comprehensive solution is to "just" change the source code of your MySQL database to respond to the queries however you like. It's an Open Source database. Download the source code, change it, recompile, and install.
The downside to this solution (assuming you can make it work) is it effects every connection to the database and has to be repeated every time you want to upgrade MySQL.
Assuming this is restricted to one table, and that the set of WHERE clauses is fixed, you can duplicate every row in that table to have every value which might be queried. For example, if you have id's 1 and 2 and value is only ever A, B or C, you'd make a table like this:
id|value
1 A
1 B
1 C
2 A
2 B
2 C
Then there are various man-in-the-middle attacks you can do to strip off the WHERE clause. If it's a fixed set of programs which are the problem you could alter the database API library they use. In Perl this would be the DBI library. In PHP this would be mysqli or PDO. And so on.
A more comprehensive solution would be to replace the MySQL server's socket (both the TCP and Unix socket) with your own little server. This would read and parse the MySQL network protocol (you may be able to extract the code to do this from the MySQL source), alter the query to strip the WHERE clause, and send it on to the real MySQL server.
These are all terrible solutions that are horribly difficult to implement correctly. Even if you got them working 100%, you're left with a system that does strange things to database queries which is likely to cause further problems down the road.
One of the most creative solutions to a problem is to realize you're solving the wrong problem.
I encourage you to post the circumstances that lead to this question, as another question, because that is the real problem. Also the management failures which lead to it will be a nice train wreck to watch.

Related

Smart Queries That Deal With NULL Values

I currently inherited a table similar to the one in the image below. I don't have the resources to do what should be done in the allotted time, which is obviously to normalize the data into separate tables break it into a few smaller tables to eliminate redundancy, etc.
My current idea for a short-term solution is to create a query for each product type and store it in a new table based on ParentSKU. In the image below, a different query would be necessary for each of the 3 example ParentSKUs. This will work okay, but if new attributes are added to a SKU the query needs to be adjusted manually. What would be ideal in the short term (but probably not very likely) is to be able to come up with a query that would only include and display attributes where there weren't any NULL values. The desired results for each of the three ParentSKUs would be the same as they are in the examples below. If there were only 3 queries total, that would be easy enough, but there are dozens of combinations based on the products and categories of each product.
I'm certainly not the man for the job, but there are scores of people way smarter than I am that frequent this site every day that may be able to steer me in a better direction. I realize I'm probably asking for the impossible here, but as the saying goes, "There are no stupid questions, only ill-advised questions that deservedly and/or inadvertently draw the ire of StackOverflow users for various reasons." Okay, I embellished a tad, but you get my point...
I should probably add that this is currently a MySQL database.
Thanks in advance to anyone that attempts to help!
First create SKUTypes with the result of
SELECT ParentSKU , count(Attr1) as Attr1,..
FROM tbl_attr
GROUP BY ParentSKU;
Then create script which will generate an SQL query for every row of SKUTypes taking every AttrN column which value > 0.

MySQL code locks up server

I have code that I have tested on the same table, but with only a few records in it.
It worked great on a handful (30) records. Did exactly what I wanted it to do.
When I added a 200 records to the table - it locks up. I have to restart apache and have tried waiting for ever for it to finish.
I could use some help figuring out why.
My table has the proper indexes and I am not having trouble in any other way.
Thanks in advance.
UPDATE `base_data_test_20000_rows` SET `NO_TOP_RATING` =
(SELECT COUNT(`ID`) FROM `base_data_test_20000_rows_2`
WHERE
`base_data_test_20000_rows_2`.`ID` != `base_data_test_20000_rows`.`ID`
AND
`base_data_test_20000_rows_2`.`ANALYST` = `base_data_test_20000_rows`.`ANALYST`
AND
`base_data_test_20000_rows_2`.`IRECCD` =
(SELECT COUNT(`ID`) FROM `base_data_test_20000_rows_2`
WHERE `IRECCD` =
(select MIN(`IRECCD`) FROM `base_data_test_20000_rows_2`
WHERE
`base_data_test_20000_rows_2`.`ANNDATS_CONVERTED` >= DATE_SUB(`base_data_test_20000_rows`.`ANNDATS_CONVERTED`,INTERVAL 1 YEAR)
AND
`base_data_test_20000_rows_2`.`ID` != `base_data_test_20000_rows`.`ID`
AND
`base_data_test_20000_rows_2`.`ESTIMID` = `base_data_test_20000_rows`.`ESTIMID`
)
)
)
WHERE `base_data_test_20000_rows`.`ANALYST` != ''
The code is just meant to look a year back for a particular brokerage - get the lowest rating - then count the number of times that analyst had the lowest rating. write that vale to the NO_TOP_RATING column.
I'm pretty sure I was wrong with my original suggestion, chainging the select count to it's own number won't help since you have conditions on your query
This is merely a hackish solution. The real way to solve this would be to optimize your query. But as a work around you could set the record count to a mysql variable, and then reference that variable in the query.
This means that you will have to make sure you set the count to the variable before you run the query. But this means that should records be added in between the time you set the variable and complete running the query you will not have the right count.
http://dev.mysql.com/doc/refman/5.0/en/user-variables.html
further thoughts:
I took a closer look before submitting this answer. That might not actually be possible since you have the where statements which is individualized to each record.
It's slow because you are using a query that counts within a query that counts within a query that has a min. It's like you are iterating through every row three times each time you iterate through a row. Which is an exponential search. So if the database has 10 records you are possibly going through each record 10^3ish times. At the number of rows you have, it's hellish.
I'm sure that there is a way to do what you are trying to do, but I can't actually tell what you are trying to do.
I would have to agree with DRapp that seeing dummy data could help us analyze what's really going on.
Since I can't wrap my head around it all, what I would try, without fully understanding what you are doing, would be to create a view of each of your sub queries and then do a query on that. http://dev.mysql.com/doc/refman/5.0/en/create-view.html
But that probably won't escape the redundancy, but it might help with the speed. But since I don't fully understand what you are doing, that's probably not the best answer.
Another not so good answer would be if you aren't running this on a mission critical db and it can go offline while you run the query then you could just changed your mysql settings and let this query run for those hours you quoted and hope it doesn't crash. But that seems less than ideal, as I have no idea if that requires additional disk space or memory to preform.
So really my best answer I can ever give you at this point, is try to see if you can approach your problem from a different angle. Or post some dummy data of what the info in Base_data_test_20000_rows looks like and what you expect it to look like after the query runs.
-Hope that helps point you to the right direction

Complex SQL String Comparison

I'm merging two databases for a client. In an ideal world, I'd simply use the unique id to join them, but in this case the newer table has different id's.
So I have to join the tables on another column. For this I need to use a complex LIKE statement to join on the Title field. But... they have changed the title's of some rows which breaks the join on those rows.
How can I write a complex LIKE statement to connect slightly different titles?
For instance:
Table 1 Title = Freezer/Pantry Storage Basket
Table 2 Title = Deep Freezer/Pantry Storage Basket
or
Table 1 Title = Buddeez Bread Buddy
Table 2 Title = Buddeez Bread Buddy Bread Dispenser
Again, there are hundreds of rows with titles only slightly different, but inconsistently different.
Thanks!
UPDATE:
How far can MySQL Full-Text Search get me? Looks similar to Shark's suggestion in SQL Server.
http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html
Do it in stages. First get all the ones that match out of the way so that you are only working with the exceptions. Your mind is incredibly smarter than the computer in finding things that are 'like' each other so scan over the data and look for similarities and make sql statements that cover the specific cases you see until you get it narrowed down as much as possible.
You will have better results if you 'help' the computer in stages like this than if you try to develop a big routine to cover all cases at once.
Of course there are certainly apis out there that do this already (such as the one google uses to guess your search phrase before you finish it) but whether any are freely available I don't know. Certainly wouldn't hurt to search for one though.
It's fairly difficult to describe ' only slightly different ' in a way that computer would understand. I suggest choosing a group of certain criteria that can be considered either most common or most important and work around it. I am not sure what those criteria should be though since i have only a vague idea what the data set looks like.

MySQL is SELECT with LIKE expensive?

The following question is regarding the speed between selecting an exact match (example: INT) vs a "LIKE" match with a varchar.
Is there much difference? The main reason I'm asking this is because I'm trying to decide if it's a good idea to leave IDs out of my current project.
For example Instead of:
http://mysite.com/article/391239/this-is-an-entry
Change to:
http://mysite.com/article/this-is-an-entry
Do you think I'll experience any performance problems on the long run? Should I keep the ID's?
Note:
I would use LIKE to keep it easier for users to remember. For example, if they write "http://mysite.com/article/this-is-an" it would redirect to the correct.
Regarding the number of pages, lets say I'm around 79,230 and the app. is growing fast. Like lets say 1640 entries per day
An INT comparison will be faster than a string (varchar) comparison. A LIKE comparison is even slower as it involves at least one wildcard.
Whether this is significant in your application is hard to tell from what you've told us. Unless it's really intensive, ie. you're doing gazillions of these comparisons, I'd go with clarity for your users.
Another thing to think about: are users always going to type the URL? Or are they simply going to use a search engine? These days I simply search, rather than try and remember a URL. Which would make this a non-issue for me as a user. What are you users like? Can you tell from your application how they access your site?
Firstly I think it doesn't really matter either way, yes it will be slower as a LIKE clause involves more work than a direct comparison, however the speed is negligible on normal sites.
This can be easily tested if you were to measure the time it took to execute your query, there are plenty of examples to help you in this department.
To move away slighty from your question, you have to ask yourself whether you even need to use a LIKE for this query, because 'this-is-an-entry' should be unique, right?
SELECT id, friendly_url, name, content FROM articles WHERE friendly_url = 'this-is-an-article';
A "SELECT * FROM x WHERE = 391239" query is going to be faster than "SELECT * FROM x WHERE = 'some-key'" which in turn is going to be faster than "SELECT * FROM x WHERE LIKE '%some-key%'" (presence of wild-cards isn't going to make a heap of difference.
How much faster? Twice as fast? - quite likely. Ten times as fast? stretching it but possible. The real questions here are 1) does it matter and 2) should you even be using LIKE in the first place.
1) Does it matter
I'd probably say not. If you indeed have 391,239+ unique articles/pages - and assuming you get a comparable level of traffic, then this is probably just one of many scaling problems you are likely to encounter. However, I'd warrant this is not the case, and therefore you shouldn't worry about a million page views until you get to 1 million and one.
2) Should you even be using LIKE
No. If the page/article title/name is part of the URL "slug", it has to be unique. If it's not, then you are shooting yourself in the foot in term of SEO and writing yourself a maintanence nightmare. If the title/name is unique, then you can just use a "WHERE title = 'some-page'", and making sure the title column has a unique index on.
Edit
You plan of using LIKE for the URL's is utterly utterly crazy. What happens if someone visits
yoursite.com/articles/the
Do you return a list of all the pages starting "the" ? What then happens if:
Author A creates
yoursite.com/articles/stackoverflow-is-massive
2 days later Author B creates
yoursite.com/articles/stackoverflow-is-massively-flawed
Not only will A be pretty angry that his article has been hi-jacked, all the perma-links he may have been sent out will be broken, and Google is going never going to give your articles any reasonable page-rank because the content keeps changing and effectively diluting itself.
Sometimes there is a pretty good reason you've never seen your amazing new "idea/feature/invention/time-saver" anywhere else before.
INT is much more faster.
In the string case I think you should not select query with LIKE but just with = because you look for this-is-an-entry, not for this-is-an-entry-and-something.
There are a few things to consider:
The type of search performed on the DataBase will be an "index seek", search for single row using an index, most of the time.
This type of exact match operation on a single row is not significantly faster using ints than strings, they are basically the same cost, for any practical purpose.
What you can do is the following optimization, search the database using a exact match (no wildcards), this is as fast as using an int index. If there is no match do a fuzzy search (search using wildcards) this is more expensive, but on the other hand is more rare and can produce more than one result. A form of ranking results is needed if you want to go for best match.
Pseudocode:
Search for an exact match using a string: Article Like 'entry'
if (match is found) display page
if (match is not found) Search using wildcards
If (one apropriate match is found) display page
If (more relevant matches) display a "Did you tried to find ... page"
If (no matches) display error page
Note: keep in mind that fuzzy URLs are not recommended from a SEO perspective, because people can link your site using multiple URLs which will split your page rank instead of increase it.
If you put an index on the varchar field it should be ok (performance wise), really depends on how many pages you are going to have. Also you have to be more careful and sanitize the string to prevent sql injections, e.g. only allow a-z, 0-9, -, _, etc in your query.
I would still prefer an integer id as it is faster and safer, change the format to something nicer like:
http://mysite.com/article/21-this-is-an-entry.html
As said, comparing INT < VARCHAR, and if the table is indexed on the field you're searching then that'll help too, as the server won't have to create a manual index on the fly.
One thing which will help validate your queries for speed and sense is EXPLAIN. You can use this to show which indexes your query is using, as well as execution times.
To answer your question, if it's possible to build your system using exact matches on the article ID (ie an INT) then it'll be much "lighter" than if you're trying to match the whole url using a LIKE statement. LIKE will obviously work, but I wouldn't want to run a large, high traffic site on it.

Best approach to construct complex MySQL joins and groups?

I find that when trying to construct complex MySQL joins and groups between many tables I usually run into strife and have to spend a lot of 'trial and error' time to get the result I want.
I was wondering how other people approach the problems. Do you isolate the smaller blocks of data at the end of the branches and get these working first? Or do you start with what you want to return and just start linking tables on as you need them?
Also wondering if there are any good books or sites about approaching the problem.
I don't work in mySQL but I do frequently write extremely complex SQL and here's how I approach it.
First, there is no substitute whatsoever for thoroughly understanding your database structure.
Next I try to break up the task into chunks.
For instance, suppose I'm writing a report concerning the details of a meeting (the company I work for does meeting planning). I will need to know the meeting name and sales rep, the meeting venue and dates, the people who attened and the speaker information.
First I determine which of the tables will have the information for each field in the report. Now I know what I will have to join together, but not exactly how as yet.
So first I write a query to get the meetings I want. This is the basis for all the rest of the report, so I start there. Now the rest of the report can probably be done in any order although I prefer to work through the parts that should have one-one relationshisps first, so next I'll add the joins and the fields that will get me all the sales rep associated information.
Suppose I only want one rep per meeting (if there are multiple reps, I only want the main one) so I check to make sure that I'm still returning the same number of records as when I just had meeting information. If not I look at my joins and decide which one is giving me more records than I need. In this case it might be the address table as we are storing multiple address for the rep. I then adjust the query to get only one. This may be easy (you may have a field that indicates the specific unique address you want and so only need to add a where condition) or you may need to do some grouping and aggregate functions to get what you want.
Then I go on to the next chunk (working first through all the chunks that should have a 1-1 relationshisp to the central data in this case the meeting). Runthe query nd check the data after each addition.
Finally I move to those records which might have a one-many relationship and add them. Again I run the query and check the data. For instance, I might check the raw data for a particular meeting and make sure what my query is returning is exactly what I expect to see.
Suppose in one of these additions of a join I find the number of distinct meetings has dropped. Oops, then there is no data in one of the tables I just added and I need to change that to a left join.
Another time I may find too many records returned. Then I look to see if my where clause needs to have more filtering info or if I need to use an aggreagte function to get the data I need. Sometimes I will add other fields to the report temporarily to see if I can see what is causing the duplicated data. This helps me know what needs to be adjusted.
The real key is to work slowly, understand your data model and check the data after every new chunk is added to make sure it is returning the results the way you think they should be.
Sometimes, If I'm returning a lot of data, I will temporarily put an additonal where clause on the query to restrict to a few items I can easily check. I also strongly suggest the use of order by because it will help you see if you are getting duplicated records.
Well the best approach to break down your MySQL query is to run the EXPLAIN command as well as looking at the MySQL documentation for Optimization with the EXPLAIN command.
MySQL provides some great free GUI tools as well, the MySQL Query Browser is what you need to use.
When running the EXPLAIN command this will break down how MySQL interprets your query and displays the complexity. It might take some time to decode the output but thats another question in itself.
As for a good book I would recommend: High Performance MySQL: Optimization, Backups, Replication, and More
I haven't used them myself so can't comment on their effectiveness, but perhaps a GUI based query builder such as dbForge or Code Factory might help?
And while the use of Venn diagrams to think about MySQL joins doesn't necessarily help with the SQL, they can help visualise the data you are trying to pull back (see Jeff Atwood's post).