MySQL row priority - mysql

I have a database table full of some really ugly and messy data. In a separate table, I have a cleaner version of the data and they are linked by an id, but I need to keep the messy dataset and can't overwrite it as I use it to check against data differences.
I'm trying to merge the data into a new table, OR use a single query across both tables and give the clean table results priority in the result.
So if id=3 uglydata=x7z cleandata=xyz, then I'd get the clean data, if cleandata was null, I'd get the ugly data.
I tried selecting cleandata AS uglydata, hoping that MySQL would just overwrite the other field, but that doesn't work (and yes, it's weird and I figured that wouldn't work).
Is their a nice way to do this?
The other solution I can think if is just to insert into the new table from the clean data first, and then insert from the ugly data as the bid is unique.
But I'm hoping I'll be able to prioritize results by type or something.

SELECT IFNULL(cleandata, uglydata)
FROM uglytable
LEFT OUTER JOIN cleantable
ON clean_id = ugly_id

Related

MySQL: migrate some content of one table into another table with a different structure

Here is what I would like to do: I have an old vBulletin forum with a couple thousands of threads and replies. I developed a Laravel forum application myself and would like to migrate data from the old forum into the new one, even though the structure is completely different. What is more, in my new MySQL table I use base64 id's that I manually have to program. So I cannot use pure MySQL to fill this in I guess.
Essentially I need to do something like this:
Old table ---- new table
'thread' -> 'title'
'text' -> 'body'
and so on... plus the thing with the base64 id's.
Any idea how to approach this? I didn't quite find anything useful using search, probably because I wasn't looking for the right keywords. Thanks a lot!
Here's how to approach this kind of problem.
Start by doing a SELECT operation to generate a result set looking like your new table's columns. Maybe something like this will work for you.
SELECT thread AS title,
text AS body,
TO_BASE64(SHA2(UUID(), 224)) AS base64id
FROM old_table
(I guessed about what belongs in your base64id column. TO_BASE64(SHA2(UUID(), 224)) generates hard-to-guess pseudorandom values. You can use anything here.)
Once you're satisfied with your resultset, you can do
INSERT INTO new_table (title, body, base64id)
SELECT .... your select query
to put rows into the new table.

Selecting rows based on duplicate values

I basically have results grid and i have a drop down menu on my application which filters the 'Carrier' column. But when selecting a certain carrier I want all the rows returned that have the same dr_id as the Carrier which has been selected.
For example if you look at the picture attached it shows my results grid. If I filter by carrier 'ACE CALL LTD_UK' then I want rows 27, 28, 29 and 30 returned because the dr_id is the same.
Thanks
I don't have a complete solution for you as I don't know exactly what you database schema is (and it is a large stored procedure!). However I do have some suggestions/comments which you might find helpful:
I assume that the stored procedure will currently be returning a single row when they filter is set to 'ACE CALL LTD_UK', if not then this might not be relevant!
What I would do in this case would be to take you SELECT statement and put the results into a CTE, temporary table or nested query. (I'm not sure what SQL DBMS your using, looks like MSSQL, but you also have a MySQL tag for you post).
Once I have those results I would then use a LEFT JOIN from the dr_id in the temp table back to the drm table on the same column. From here you will again need to join to other tables where the data is not distinct, for example the Carrier table, then select the columns that you require.
You could do all this in the existing SELECT statement, however you will have to start joining on tables multiple times or use nested queries, which would get very messy. However the main reason why I chose the solution I have posted is because I don't know the stored procedure well enough and therefore I chose the safest solution.
If you want an example of what I mean, I will try and provide one.

Optimized SELECT query in MySQL

I have a very large number of rows in my table, table_1. Sometimes I just need to retrieve a particular row.
I assume, when I use SELECT query with WHERE clause, it loops through the very first row until it matches my requirement.
Is there any way to make the query jump to a particular row and then start from that row?
Example:
Suppose there are 50,000,000 rows and the id which I want to search for is 53750. What I need is: the search can start from 50000 so that it can save time for searching 49999 rows.
I don't know the exact term since I am not expert of SQL!
You need to create an index : http://dev.mysql.com/doc/refman/5.1/en/create-index.html
ALTER TABLE_1 ADD UNIQUE INDEX (ID);
The way I understand it, you want to select a row with id 53750. If you have a field named id you could do this:
SELECT * FROM table_1 WHERE id = 53750
Along with indexing the id field. That's the fastest way to do so. As far as I know.
ALTER table_1 ADD UNIQUE INDEX (<collumn>)
Would be a great first step if it has not been generated automatically. You can also use:
EXPLAIN <your query here>
To see which kind of query works best in this case. Note that if you want to change the where statement (anywhere in the future) but see a returning value in there it will be a good idea to put an index on that aswell.
Create an index on the column you want to do the SELECT on:
CREATE INDEX index_1 ON table_1 (id);
Then, select the row just like you would before.
But also, please read up on databases, database design and optimization. Your question is full of false assumptions. Don't just copy and paste our answers verbatim. Get educated!
There are several things to know about optimizing select queries like Range and Where clause Optimization, the documentation is pretty informative baout this issue, read the section: Optimizing SELECT Statements. Creating an index on the column you evaluate is very helpfull regarding performance too.
One possible solution You can create View then query from view. here is details of creating view and obtain data from view
http://www.w3schools.com/sql/sql_view.asp
now you just split that huge number of rows into many view (i. e row 1-10000 in one view then 10001-20000 another view )
then query from view.
I am pretty sure that any SQL database with a little respect for themselves does not start looping from the first row to get the desired row. But I am also not sure how they makes it work, so I can't give an exact answer.
You could check out what's in your WHERE-clause and how the table is indexed. Do you have a proper primary key? Like using a numeric data type for that. Do you have indexes on more columns, that is used in your queries?
There is also alot to concider when installing the database server, like where to put the data and log files, how much memory to give the server and setting the growth. There's a lot you can do to tune your server.
You could try and split your tables in partitions
More about alter tables to add partitions
Selecting from a specific partition
In your case you could create a partition on ID for every 50.000 rows and when you want to skip the first 50.000 you just select from partition 2. How to do this ies explained quite well in the MySQL documentation.
You may try simple as this one.
query = "SELECT * FROM tblname LIMIT 50000,0
i just tried it with phpmyadmin. WHERE the "50,000" is the starting row to look up.
EDIT :
But if i we're you i wouldn't use this one, because it will lapses the 1 - 49999 records to search.

MySQL CONCAT '*' symbol toasts the database

I have a table used for lookups which stores the human-readable value in one column and a the same text stripped of special characters and spaces in another. e.g., the value "Children's Shows" would appear in the lookup column as "childrens-shows".
Unfortunately the corresponding main table isn't quite that simple - for historical reasons I didn't create myself and now would be difficult to undo, the lookup value is actually stored with surrounding asterisks, e.g. '*childrens-shows*'.
So, while trying to join the lookup table sans-asterisks with the main table that has asterisks, I figured CONCAT would help me add them on-the-fly, e.g.;
SELECT *
FROM main_table m
INNER JOIN lookup_table l
ON l.value = CONCAT('*',m.value,'*')
... and then the table was toast. Not sure if I created an infinite loop or really screwed the data, but it required an ISP backup to get the table responding again. I suspect it's because the '*' symbol is probably reserved, like a wildcard, and I've asked the database to do the equivalent of licking its own elbow. Either way, I'm hesitant to 'experiment' to find the answer given the spectacular way it managed to kill the database.
Thanks in advance to anyone who can (a) tell me what the above actually did to the database, and (b) how I should actually join the tables?
When using CONCAT, mysql won't use the index. Use EXPLAIN to check this, but a recent problem I had was that on a large table, the indexed column was there, but the key was not used. This should not bork the whole table however, just make it slow. Possibly it ran out of memory, started to swap and then crashed halfway, but you'd need to check the logs to find out.
However, the root cause is clearly bad table design and that's where the solution lies. Any answer you get that allows you to work around this can only be temporary at best.
Best solution is to move this data into a separate table. 'Childrens shows' sounds like a category and therefore repeated data in many rows. This should really be an id for a 'categories' table, which would prevent the DB from having to run CONCAT on every single row in the table, as you could do this:
SELECT *
FROM main_table m
INNER JOIN lookup_table l
ON l.value = m.value
/* and optionally */
INNER JOIN categories cat
ON l.value = cat.id
WHERE cat.name = 'whatever'
I know this is not something you may be able to do given the information you supplied in your question, but really the reason for not being able to make such a change to a badly normalised DB is more important than the code here. Without either the resources or political backing to do things the right way, you will end up with even more headaches like this, which will end up costing more in the long term. Time for a word with the boss perhaps :)

adding data to interrelated tables..easier way?

I am a bit rusty with mysql and trying to jump in again..So sorry if this is too easy of a question.
I basically created a data model that has a table called "Master" with required fields of a name and an IDcode and a then a "Details" table with a foreign key of IDcode.
Now here's where its getting tricky..I am entering:
INSERT INTO Details (Name, UpdateDate) Values (name, updateDate)
I get an error: saying IDcode on details doesn't have a default value..so I add one then it complains that Field 'Master_IDcode' doesn't have a default value
It all makes sense but I'm wondering if there's any easy way to do what I am trying to do. I want to add data into details and if no IDcode exists, I want to add an entry into the master table. The problem is I have to first add the name to the fund Master..wait for a unique ID to be generated(for IDcode) then figure that out and add it to my query when I enter the master data. As you can imagine the queries are going to probably get quite long since I have many tables.
Is there an easier way? where everytime I add something it searches by name if a foreign key exists and if not it adds it on all the tables that its linked to? Is there a standard way people do this? I can't imagine with all the complex databases out there people have not figured out a more easier way.
Sorry if this question doesn't make sense. I can add more information if needed.
p.s. this maybe a different question but I have heard of Django for python and that it helps creates queries..would it help my situation?
Thanks so much in advance :-)
(decided to expand on the comments above and put it into an answer)
I suggest creating a set of staging tables in your database (one for each data set/file).
Then use LOAD DATA INFILE (or insert the rows in batches) into those staging tables.
Make sure you drop indexes before the load, and re-create what you need after the data is loaded.
You can then make a single pass over the staging table to create the missing master records. For example, let's say that one of your staging table contains a country code that should be used as a masterID. You could add the master record by doing something along the lines of:
insert
into master_table(country_code)
select distinct s.country_code
from staging_table s
left join master_table m on(s.country_code = m.country_code)
where m.country_code is null;
Then you can proceed and insert the rows into the "real" tables, knowing that all detail rows references a valid master record.
If you need to get reference information along with the data (such as translating some code) you can do this with a simple join. Also, if you want to filter rows by some other table this is now also very easy.
insert
into real_table_x(
key
,colA
,colB
,colC
,computed_column_not_present_in_staging_table
,understandableCode
)
select x.key
,x.colA
,x.colB
,x.colC
,(x.colA + x.colB) / x.colC
,c.understandableCode
from staging_table_x x
join code_translation c on(x.strange_code = c.strange_code);
This approach is a very efficient one and it scales very nicely. Variations of the above are commonly used in the ETL part of data warehouses to load massive amounts of data.
One caveat with MySQL is that it doesn't support hash joins, which is a join mechanism very suitable to fully join two tables. MySQL uses nested loops instead, which mean that you need to index the join columns very carefully.
InnoDB tables with their clustering feature on the primary key can help to make this a bit more efficient.
One last point. When you have the staging data inside the database, it is easy to add some analysis of the data and put aside "bad" rows in a separate table. You can then inspect the data using SQL instead of wading through csv files in yuor editor.
I don't think there's one-step way to do this.
What I do is issue a
INSERT IGNORE (..) values (..)
to the master table, wich will either create the row if it doesn't exist, or do nothing, and then issue a
SELECT id FROM master where someUniqueAttribute = ..
The other option would be stored procedures/triggers, but they are still pretty new in MySQL and I doubt wether this would help performance.