Comparing two large data sets or tables in mysql - mysql

Lets say I have 2 large csv files (more than 1 million rows) and both are a list of names with rank. The goal is to find the same names among both lists, unique names to list 1, and unique names to list 2.
I wanted to do this is mySQL so I created a table for each list but looping through over a million records a million times seems like a poor way of doing this and very slow. How would you do this?
This is an example but bad query: http://sqlfiddle.com/#!2/9f272/2

The following returns the number of times that names appear in each table, with the count. If names are unique in each table, then it might return something like:
InTable1 InTable2 Count
1 0 xxx
0 1 yyy
1 1 zzz
The query uses a union all and group by:
select InTable1, InTable2, count(*), min(name), max(name)
from (select name, sum(which = 1) as InTable1, sum(which = 2) as InTable2
from ((select name, 1 as which
from table1
) union all
(select name, 2 as which
from table2
)
) t
group by name
) t
group by InTable1, InTable2;
EDIT:
You need to create indexes. Here is the syntax:
create index table1_name on table1(name);
create index table2_name on table2(name);

Related

Select least 6 values across 100 columns in a MySQL row

I have a table with 103 columns. First column (rowID) is the row index, the next one contains a date, and the third one contains a string (a name), then there are 100 columns (named A1 to A100) that each contain an integer. I am trying to write a query to fetch the lowest 6 values among those 100 columns, for each row.
Here is what I tried. I had to write out all 100 columns (is there a better way?), and this only gets me the smallest 1, and NOT the smallest 6:
SELECT LEAST(A1,A2,A3,A4,...A100) FROM myTable WHERE rowID=1
I am thinking maybe I can use 5 queries to run the least command each time, returning the result to the backend which will then exclude the column that contained the least value in the previous query. However I am not sure this is the best way because I am trying to keep it all within MySQL. Is there a way to use sub-queries to do this? Or another effective method. Any help would be appreciated!
Edit: I also need to know the columns from which those minimum 6 values were obtained.
You seem to be storing a multi-valued attribute in a denormalized way.
If you need to do set-oriented comparisons on these values, they should be stored in rows, not columns.
You can "unpivot" them, so each value is on its own row, like this:
SELECT 1 AS ValNo, A1 AS Val FROM MyTable WHERE rowID=1
UNION ALL
SELECT 2, A2 FROM MyTable WHERE rowID=1
UNION ALL
SELECT 3, A3 FROM MyTable WHERE rowID=1
UNION ALL
SELECT 4, A4 FROM MyTable WHERE rowID=1
UNION ALL
...
UNION ALL
SELECT 100, A100 FROM MyTable WHERE rowID=1
Then by putting that into a subquery, get the lowest 6 values.
SELECT ValNo, Val
FROM ( ... subquery above ... ) AS t
ORDER BY Val
LIMIT 6
You would be better off to store a table with one column for the value, and up to 100 rows for each rowId:
CREATE TABLE MyNewTable (
RowId INT,
OrdinalId TINYINT, -- 1 to 100
Aval INT,
PRIMARY KEY (RowId, OrdinalId)
);
Then you can query it more simply:
SELECT OrdinalId, Aval
FROM MyNewTable
WHERE RowId = 1
ORDER BY Aval
LIMIT 6;

How to get max value of a column from two tables

I have one table named 'posts' and one table named 'threads'. Both have the column called 'total_id' which is an integer.
Now, how to get from those two tables the 1 highest value (max) in the 'total_id' column? (MySQL)
You can get it as follows:
SELECT Greatest(
(SELECT Max(total_id) FROM posts),
(SELECT Max(total_id) FROM threads)
)
Though it's not really very clear what's your expected result but in case you want max of total_id including both tables data? If yes, then you can do a UNION and then get the highest value like
select max(total_id) as max_total_id from (
select total_id from posts
union
select total_id from threads ) xxx;

Fetch data from more then 5 tables from UserID common in MySQL

we are trying to build an application using PHP and MYSQL here we have more then 5 different category wise table like tblcategory1, tblcategory2, tblcategory3 and so on.
And each of the category table have a common userID field in all the tables.
We want to check in all the tables what are the entries we have with that particular userID.
Is there any query to checking from multiple tables?
Thank you (in advance)
Yes you need to look at the UNION keyword. It roughly works like this:
SELECT columnName
FROM category1
WHERE userID = 1
UNION
SELECT columnName
FROM category2
WHERE userID = 1
UNION
SELECT columnName
FROM category2
WHERE userID = 1
This is essentially 3 separate queries which all run as one, and the results are all joined together to form one list of results.
For more info: UNION
An alternative structure to #LaurenceFrost answer is to use UNION ALL to bring the tables together, then filter that result.
SELECT
columnName
FROM
(
SELECT userID, columnName FROM tblcategory1
UNION ALL
SELECT userID, columnName FROM tblcategory2
UNION ALL
SELECT userID, columnName FROM tblcategory3
UNION ALL
SELECT userID, columnName FROM tblcategory4
UNION ALL
SELECT userID, columnName FROM tblcategory5
)
ilvCategoryAll
WHERE
userID = 1
This layout becomes much more friendly when you have JOINs and other business logic, which you don't really want to have to repeat for every source table.
Also, it should be noted that I used UNION ALL rather than UNION. This is because UNION expressly removes duplicates, which can be an expensive process (even if there are no duplicates to find). UNION ALL, however, does not do this de-duplication and is significantly lower cost.
Note: the ilv in ilvCategoryAll means "in-line-view".

Select id from array which is not in table

I have a PHP coma separated string of ids like 1,2,3. I have a MySQL table which has id column
Table task_comments:
id
--
1
2
I want to get all the ids in the list which are not in the table. Here i would like to get the 3 as result.
Currently I am building a query like the following in PHP and it is working.
SELECT id FROM (
SELECT 1 id FROM DUAL
UNION ALL
SELECT 2 id FROM DUAL
UNION ALL
SELECT 3 id FROM DUAL
) a WHERE id NOT IN (SELECT id FROM task_comments);
I don't think this is a good way to do this. I want to know if there is a better method to do this, because if the list is big the union list will grow.
Thanks
PS: I can post the PHP code used to make the query also if needed.
PPS: I would like to know if there is better MySQL Query.
Your string separated values in PHP:
$my_ids = "1,2,3";
SQL query in PHP:
$query = "SELECT id FROM task_comments WHERE id IN ($my_ids)";
This will return the id values from database which is 1 or 2 or 3.
Then you can simply compare it.
What you do is already the way to do it. There is no other way to create sets to reason over than the (pretty ugly) union construct. You can leave of the "from dual"s and replace the union alls with plain unions to make it shorter - although with a very large list union all might be the more performant solution as it does not sort for duplicate deletion.
SELECT id FROM (
SELECT 1 id
UNION
SELECT 2 id
UNION
SELECT 3 id
) a WHERE id NOT IN (SELECT id FROM tasklist);
You might also want to have a look at temporary tables. That way you could create the set you need in a more natural way without hitting the limits of the large SQL involving unions.
CREATE TEMPORARY TABLE temp_table (id int);
INSERT INTO temp_table VALUES((1),(2),(3)); -- or just repeat for as many values as you might have from your app (batch insert?)
SELECT id FROM temp_table
WHERE id NOT IN (SELECT id FROM tasklist);
See more on temporary tables here.
You can do it like that: select your ids:
SELECT id FROM task_comments WHERE id IN (1,2,3)
(here (1,2,3) is built from your array data - for example, via implode() function)
Then, in a cycle, fetch your ids into an array and then use array_diff() to find absent values.
May be you should first save all the distinct id's from the table that are present in your string of id's -
SELECT DISTINCT id FROM task_comments WHERE id IN (1,2,3..)
and then compare the two.

Fast MAX, GROUP BY on the concatenation of mulliple columns

I have a table with 4 columns: name, date, version,and value. There's an composite index on all four, in that order. It has 20M rows: 2.000 names, approx 1.000 dates per name, approx 10 versions per date.
I'm trying to get a list that give for all names the highest date, the highest version on that date, and the associated value.
When I do
SELECT name,
MAX(date)
FROM table
GROUP BY name
I get good performance and the database uses the composite index
However, when I join the table to this in order to get the MAX(version) per name the query takes ages. There must be a way to get the result in about the same magnitude of time as the SELECT statement above? I can easily be done by using the index.
Try this: (I know it needs a few syntax tweaks for MySQL... ask for them and I will find them)
INSERT INTO #TempTable
SELECT name, MAX(Date) as Date
FROM table
Group By name
select table.name, table.date, max(table.version) as version
from table
inner join #TempTable on table.name = #temptable.name and table.date = #temptable.date
group by table.name, table.date