UNION unexpected results?

UNION unexpected results? - mysql

Let's say we have:
table 1
a (int) | b (int)
--------|--------
1 | 4
2 | 4
table 2
c (text) d (text)
---------|---------
hoi | hi
Query:
SELECT * FROM table1
UNION
SELECT * FROM table2
yields
a | b
------|--------
1 | 4
2 | 4
hoi | hi
At least, from the query I just ran on mysql
I'd expect (1, 4, NULL, NULL). Why doesn't this give an error?

UNION just appends the rows of one query to the rows of the other. As long as the two queries return the same number of columns, there's no error. The column names always come from the fist query. If the datatypes are different, it finds a common type that they can all be converted to; in your example, it converts the int columns to text (MySQL is loose about this, some other databases require that you use explicit CAST() calls to get everything to the same type).
Since your queries each return two columns, the result contains two columns, using the column names from table1.

This is a bit long for a comment.
I just tested this on MySQL 8.0 and SQLite and it returns:
a b
1 4
2 4
hoi hi
I find this surprising. I would expect the columns to be given an integer type and for there to be either a type conversion error or 0 for the third row. Well, actually the SQLite results isn't that strange, because types are much more fungible in SQLite.
SQL Server and Postgres give errors that I would expect -- type conversion errors that cause the query to fail.

Related

Is there a way in MySQL to use aggregate functions in a sub section of binary column?

Suppose we have 2 numbers of 3 bits each attached together like '101100', which basically represents 5 and 4 combined. I want to be able to perform aggregation functions like SUM() or AVG() on this column separately for each individual 3-bit column.
For instance:
'101100'
'001001'
sum(first three column) = 6
sum(last three column) = 5
I have already tried the SUBSTRING() function, however, speed is the issue in that case as this query will run on millions of rows regularly. And string matching will slow the query.
I am also open for any new databases or technologies that may support this functionality.

You can use the function conv() to convert any part of the string to a decimal number:
select
sum(conv(left(number, 3), 2, 10)) firstpart,
sum(conv(right(number, 3), 2, 10)) secondpart
from tablename
See the demo.
Results:
| firstpart | secondpart |
| --------- | ---------- |
| 6 | 5 |

With the current understanding I have of your schema (which is next to none), the best solution would be to restructure your schema so that each data point is its own record instead of all the data points being in the same record. Doing this allows you to have a dynamic number of data points per entry. Your resulting table would look something like this:
id | data_type | value
ID is used to tie all of your data points together. If you look at your current table, this would be whatever you are using for the primary key. For this answer, I am assuming id INT NOT NULL but yours may have additional columns.
Data Type indicates what type of data is stored in that record. This would be the current tables column name. I will be using data_type_N as my values, but yours should be a more easily understood value (e.g. sensor_5).
Value is exactly what it says it is, the value of the data type for the given id. Your values appear to be all numbers under 8, so you could use a TINYINT type. If you have different storage types (VARCHAR, INT, FLOAT), I would create a separate column per type (val_varchar, val_int, val_float).
The primary key for this table now becomes a composite: PRIMARY KEY (id, data_type). Since your previously single record will become N records, the primary key will need to adjust to accommodate that.
You will also want to ensure that you have indexes that are usable by your queries.
Some sample values (using what you placed in your question) would look like:
1 | data_type_1 | 5
1 | data_type_2 | 4
2 | data_type_1 | 1
2 | data_type_2 | 1
Doing this, summing the values now becomes trivial. You would only need to ensure that data_type_N is summed with data_type_N. As an example, this would be used to sum your example values:
SELECT data_type,
SUM(value)
FROM my_table
WHERE id IN (1,2)
GROUP BY data_type
Here is an SQL Fiddle showing how it can be used.

Using MySQL, always have the expected number of elements in a dataset by filling the remaining with previous written data

is it possible to always respect an expected number of element constraint by filling the remaining of a SQL dataset with previous written data, keeping the data insertion in order? Using MySQL?
Edit
In a web store, I always want to show n elements. I update the show elements every w seconds and I want to loop indefinitely.
By example, using table myTable:
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+----+
Something like
SELECT id FROM myTable WHERE id > 3 ORDER BY id ALWAYS_RETURN_THIS_NUMBER_OF_ELEMENTS 5
would actually return (where ALWAYS_RETURN_THIS_NUMBER_OF_ELEMENTS doesn't exist)
+----+
| id |
+----+
| 4 |
| 5 |
| 4 |
| 5 |
| 4 |
+----+

This is a very strange need. Here is a method:
select id
from (SELECT id
FROM myTable
WHERE id > 3
ORDER BY id
LIMIT 5
) t cross join
(select 1 as n union all select 2 union all select 3 union all select 4 union all select 5
) n
order by n.n, id
limit 5;
You may need to extend the list of numbers in n to be sure you have enough rows for the final limit.

No, that's not what LIMIT does. The LIMIT clause is applied as the last step in the statement execution, after aggregation, after the HAVING clause, and after ordering.
I can't fathom a use case that would require the type of functionality you describe.
FOLLOWUP
The query that Gordon Linoff provided will return the specified result, as long as there is at least one row in myTable that satisfies the predicate. Otherwise, it will return zero rows.
Here's the EXPLAIN output for Gordon's query:
id select_type table type key rows Extra
-- ------------ ---------------- ----- ------- ---- -------------------------------
1 PRIMARY <derived2> ALL 5 Using temporary; Using filesort
1 PRIMARY <derived3> ALL 5 Using join buffer
3 DERIVED No tables used
4 UNION No tables used
5 UNION No tables used
6 UNION No tables used
7 UNION No tables used
UNION RESULT <union3,4,5,6,7> ALL
2 DERIVED myTable range PRIMARY 10 Using where; Using index
Here's the EXPLAIN output for the original query:
id select_type table type key rows Extra
-- ----------- ----------------- ----- ------- ---- -------------------------------
1 SIMPLE myTable range PRIMARY 10 Using where; Using index
It just seems like it would be a whole lot more efficient to reprocess the resultset from the original query, if that resultset contains fewer than five (and more than zero) rows. (When that number of rows goes from 5 to 1,000 or 150,000, it would be even stranger.)
The code to get multiple copies of rows from a resultset is quite simple: fetch the rows, and if the end of the result set is reached before you've fetched five (or N) rows, then just reset the row pointer back to the first row, so the next fetch will return the first row again. In PHP using mysqli, for example, you could use:
$result->data_seek(0);
Or, for those still using the deprecated mysql_ interface:
mysql_data_seek($result,0);
But if you're returning only five rows, it's likely you aren't even looping through the result at all, and you already stuffed all the rows into an array. Just loop back through the beginning of the array.
For MySQL interfaces that don't support a scrollable cursor, we'd just store the whole resultset and process it multiple times. With Perl DBI, using the fetchall_arrayref, with JDBC (which is going to store the whole result set in memory anyway without special settings on the connection), we'd store the resultset as an object.
Bottom line, squeezing this requirement (to produce a resultset of exactly five rows) back to the database server, and pulling back duplicate copies of a row and/or storing duplicate copies of a row in memory just seems like the wrong way to satisfy the use case. (If there's rationale for storing duplicate copies of a row in memory, then that can be achieved without pulling duplicate copies of rows back from the database.)
It's just very odd that you say you're using/implementing a "circular buffer", but that you choose not to "circle" back around to the beginning of a resultset which contains fewer than five rows, and instead need to have MySQL return you duplicate rows. Just very, very strange.

sql find rows partially matching a string

I want to find rows in table having rows which contains a string
For example, I m having rows in a column names 'atest' in a table named 'testing' -
test
a
cool
another
now I want to select the rows having a word from the string 'this is a test' using a sql
select * from testing where instr(atext, 'this is a test') >0;
but this is not selecting any row.

Reverse the arguments to INSTR.
WHERE INSTR('this is a test', atext)

with full text index -
select * from anti_spam where match (atext) against ("this is a test" in boolean mode);

This is a 'reversed' like:
select * from testing where 'this is a test' LIKE CONCAT('%',atext,'%');
It can be slow on tables having a lot of records.
This returns the rows, where the value of the atext column can be found in the given string.
(for example matches when atext = 'is a t' because it can be found in the given string)
Or you can write a regex.
select * from testing where atext REGEXP '^(this|is|a|test)$';
This matching all rows what contains exactly the specified words.
In your scripting or programming language, you should only replace spaces with | and add ^ to the begining of the string and $ to the ending of the string, and REGEXP, not equation.
("this is a test" -> ^this|is|a|test$ )
If you have a lot of records in the table, this queries can be slow. Because the sql engine does not use indexes in regexp queries.
So if you have a lot of rows on your table and does not have more than 4 000 000 words i recommend to make an indexing table. Example:
originalTable:
tid | atext (text)
1 | this is
2 | a word
3 | a this
4 | this word
5 | a is
....
indexTable:
wid | word (varchar)
1 | this
2 | is
3 | a
4 | word
switchTable:
tid | wid
1 | 1
1 | 2
2 | 3
2 | 4
3 | 1
3 | 3
...
You should set indexes, tid, wid and word fields.
Than the query is:
SELECT o.*
FROM originalTable as o
JOIN switchTable as s ON o.tid = s.tid
JOIN indexTable as i on i.wid=s.wid
WHERE i.word = 'this' or i.word='is' or i.word='a' or i.word='test'
This query can be mutch faster if your originalTable have 'a lot' records, because here the sql engine can make indexed searches. But there is a bit more work when insert a row in the original table you must make insertions in the other two tables.
The result between the runtime of the 3 queries depends on your database table size. And that you want to optimize for insertions or selections. ( the rate between insert/update and select queryes )

Get first available numbers not listed in my MySQL database table

Any SQL to get first numbers not listed in my MySQL database table?
Ex:
Table:
Users
ID | Name | Number
------------------------
1 | John | 1456
2 | Phil | 345
3 | Jenny | 345612
In this case the SQL must return me list of row with number from 1 to 344 and 346 to 1455 and 1457 to 345611
Any suggestions? Maybe with some procedure?

I like the answer by #pst but would suggest another alternative.
Create a new table of unassigned numbers, insert a few thousand rows or so in there.
Present some of those numbers to the user.
When a number is used, delete it from the unassigned numbers table.
Periodically generate more unassigned numbers as needed.
The generation of those unassigned numbers could use the random method suggested by #pst, but using this method you move the uncertainty of how long it'll take to generate a list of unassigned numbers into a batch task rather than having to do it at the front end while the user is waiting. This probably isn't an issue if the usage of the number space is sparse, but as more of the number space becomes used, it becomes a bigger issue.

Given the comment(s), my first approach would be use a "random number" probe. This approach assumes:
Number is indexed; and
There are "significantly less" users than available numbers
Approach:
Choose N (i.e. 1-10) numbers at random on the client;
Query the database for Number IN (ns..), or Number = n for N=1; then
If the number is available can be detected based on not finding the requested record(s).
A size of N=1 is likely "okay" in this case and it is the most trivial to implement although it will require at least 6 database requests to find 6 free numbers. A larger N would decrease the number of trips to the database.
Make sure to use transactions.

SELECT 'start', 1 AS number FROM tableA
UNION
SELECT 'min', number - 1 number FROM tableA
UNION
SELECT 'max', number + 1 number FROM tableA
ORDER BY number
You can check the answer at http://www.sqlfiddle.com/#!2/851de/6
Then you can make a comparison of missing numbers when you populate the next time.

Just use an auto increment column. The database will assign the next number automatically. You don't need to even know what it is at the time of the insert. Just tell the user the number he got, don't give him a choice at all.

Based on your comments, the approach below might work for you. It doesn't really answer your specific question, but it probably meets your requirements.
I'm going to assume your requirements cannot change (e.g., presenting users with 6 possible id choices). Frankly I think it's a bit of a weird requirement, but it makes for some interesting SQL. :-)
Here's my approach: generate 10 random numbers. Filter out any already in the database. Present 6 of these random numbers to your user. Random id numbers have very nice properties with respect to transactionality compared to sequential id numbers, so this should scale very nicely should your app become popular.
SELECT
temp.i
FROM
(
SELECT 18 AS i -- 10 random
UNION SELECT 42 -- numbers.
UNION SELECT 88
UNION SELECT 191 -- Let's assume
UNION SELECT 192 -- you generated
UNION SELECT 193 -- these in the
UNION SELECT 1000 -- application
UNION SELECT 123456 -- layer.
UNION SELECT 1092930
UNION SELECT 9892919
) temp
LEFT JOIN
mytable ON (temp.i = mytable.i)
WHERE
mytable.i IS NULL -- filter out collisions
LIMIT
6 -- limit results to 6
SQL pop quiz time!!!
Why does the line "WHERE mytable.i IS NULL" filter collisions? (Hint: How can mytable.i be null when it's a primary key?)
Here's some test data:
CREATE TABLE mytable (i BIGINT PRIMARY KEY) ;
INSERT INTO mytable VALUES (88), (3), (192), (123456) ;
Run the query above, and here's the result. Notice that 88, 192, and 123456 were filtered out, since they would be collisions against the test data.
+---------+
| i |
+---------+
| 18 |
| 42 |
| 191 |
| 193 |
| 1000 |
| 1092930 |
+---------+
And how to generate those random numbers? Probably rand() * 9223372036854775807 would work. (Assuming you don't want negative numbers!)

Postgresql 9.2 trigger to separate subfields in a stored string

Postgresql 9.2 DB which automatically collects data from various machines.
The DB stores all the data including the machine id, the firmware, the manufacturer id etc as well as the actual result data. In one stored field (varchar) there are 5 sub fields which are separated by the ^ character.
ACT18!!!8246-EN-2.00013151!1^7.00^F5260046959^H1P1O1R1C1Q1L1^1 (Machine 1)
The order of this data seems to vary from one machine to another. Eg machine 1 2 and 3. The string above shows the firmware version, in this case "7.0" and it appears in sub-field 2. However, another machine sends the data in a different sub-field - in this case sub-field 3 and the value is "1"
BACT/ALERT^A.00^1^^ (Machine 2)
I want to store the values "7.0" and "1" in a different field in a separate table using a CREATE TRIGGER t_machine_id AFTER INSERT function where I can choose which sub-field is used depending on the machine the data has come from.
Is split_part the best function to do this? Can anyone supply an example code that will do this? I can't find anything in the documentation.

You need to (a) split the data using something like regexp_split_to_table then (b) match which parts are which using some criteria, since you don't have field position-order to rely on. Right now I don't see any reliable rule to decide what's the firmware version and what's the machine number; you can't really say where field <> machine_number because if machine 1 had firmware version 1 you'd get no results.
Given dummy data:
CREATE TABLE machine_info(data text, machine_no integer);
INSERT INTO machine_info(data,machine_no) (VALUES
('ACT18!!!8246-EN-2.00013151!1^7.00^F5260046959^H1P1O1R1C1Q1L1^1',1),
('BACT/ALERT^A.00^1^^',2)
);
Something like:
SELECT machine_no, regexp_split_to_table(data,'\^')
FROM machine_info;
will give you a table of split data elements with machine number, but then you need to decide which fields are which:
machine_no | regexp_split_to_table
------------+------------------------------
1 | ACT18!!!8246-EN-2.00013151!1
1 | 7.00
1 | F5260046959
1 | H1P1O1R1C1Q1L1
1 | 1
2 | BACT/ALERT
2 | A.00
2 | 1
2 |
2 |
(10 rows)
You may find the output of substituting regexp_split_to_array more useful, depending on whether you can get any useful info from field order and how you intend to process the data.
regress=# SELECT machine_no, regexp_split_to_array(data,'\^')
FROM machine_info;
machine_no | regexp_split_to_array
------------+------------------------------------------------------------------
1 | {ACT18!!!8246-EN-2.00013151!1,7.00,F5260046959,H1P1O1R1C1Q1L1,1}
2 | {BACT/ALERT,A.00,1,"",""}
(2 rows)
Say there are two firmware versions; version 1 sends code^blah^fwvers^^ and version 2 and higher sends code^fwvers^blah^blah2^machineno. You can then differentiate between the two because you know that version 1 leaves the last two fields blank:
SELECT
machine_no,
CASE WHEN info_arr[4:5] = ARRAY['',''] THEN info_arr[3] ELSE info_arr[2] END AS fw_vers
FROM (
SELECT machine_no, regexp_split_to_array(data,'\^')
FROM machine_info
) string_parts(machine_no, info_arr);
results:
machine_no | fw_vers
------------+---------
1 | 7.00
2 | 1
(2 rows)
Of course, you've only provided two sample data, so the real matching rules are likely to be more complex. Consider writing an SQL function to extract the desired field(s) and return them from the array passed.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008