I'm trying to change how we make some transformations in our tables on RDS MySql. This table have 20 million records and 200 columns. We have a pipeline executed monthly where we download the table to an EC2, use python to do the transformation, then it is reuploaded.
Upon presenting dbt, boss wants to see it working because of the benefits: everything will stay on SQL (I am the only python person in our small 20 people company), will have documentation, automated tests and version control [all this is really needed at the moment]. I made it happen, wrote SQL on dbt that produces the same results of the python script and runs directly on the mysql database using this https://pypi.org/project/dbt-mysql/ adapter.
There are some problems and the one of them i think will start helping me most is about the boolean in mysql. I already know all that thing about boolean, tinyint(1), etc, etc. But all columns intended to be "boolean" are going to the tables as INT, and I want them as tinyint, because it is taking 4 times the space it should.
Edit: added more information thanks to feedback
My raw table comes with all columns as str, i'm trying to cast the correct types. As this one should be boolean, i expected it to be converted to tinyint(1). When I create a table via pandas and there is a bool column, the table column is tinyint(1). But when I try to do something like this in SQL, the column becomes int.
The code is really just that:
SELECT IF(myStrColumn = '1', TRUE, FALSE)
FROM myRawTable
The resulting column is given as int, but i wanted it to be tinyint(1) to represent boolean.
tinyint is not a valid type to be passed to cast as per documentation https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_cast so it doesn't work
After looking at the MySQL docs, I think you have two options:
Create a new, custom table materialization that allows you to leverage the MySQL syntax:
create table my_table (my_col tinyint) as select ...
Add a post-hook that narrows the column after you've created the table:
config(
materialized="table",
post_hook="alter table {{ this }} modify my_col tinyint"
)
For #1, there is a guide to creating materializations in the dbt docs, but it is a complex and advanced topic. I think the dbt-mysql adapter uses the vanilla/default table materialization in the global project. You may want to check out the MySQL incremental materialization macro, which is here.
Related
So I'm kind of stumped.
I have a MySql project that involves a database table that is being manipulated and altered by scripts on a regular basis. This isn't so unusual, but I need to automate a script to run (after hours, when changes aren't happening) that would save the result of the following:
SHOW CREATE TABLE [table-name];
This command generates the ready-to-run script that would create the (empty) table in it's current state.
In SqlWorkbench and Navicat it displays the result of this SHOW command in a field in a result set, as if it was the result of a SELECT statement.
Ideally, I want to take into a variable in a procedure, and change the table name; adding a '-mm-dd-yyyy' to end of it, so I could show the day-to-day changes in the table schema on an active server.
However, I can't seem to be able to do that. Unlike a Select result set, I can't use it like that. I can't get it in a variable, or save it to a temporary, or physical table or anything. I even tried to return this as a value in a function, from which I got the error that a function cannot return a result set - which explains why it's displayed like one in the db clients.
I suspect that this is a security thing in MySql? If so, I can totally understand why and see the dangers exposed to a hacker, but this isn't a public-facing box at all, and I have full root/admin access to it. Hopefully somebody has already tackled this problem before.
This is on MySql 8, btw.
[Edit] After my first initial comments, I need to add; I'm not concerned about the data with this question whatsoever, but rather just these schema changes.
What I'd really -like- to do is this:
SELECT `Create Table` FROM ( SHOW CREATE TABLE carts )
But this seems to be mixing apples and oranges, as SHOW and SELECT aren't created equal, although they both seem to return the same sort of object
You cannot do it in the MySQL stored procedure language.
https://dev.mysql.com/doc/refman/8.0/en/show.html says:
Many MySQL APIs (such as PHP) enable you to treat the result returned from a SHOW statement as you would a result set from a SELECT; see Chapter 29, Connectors and APIs, or your API documentation for more information. In addition, you can work in SQL with results from queries on tables in the INFORMATION_SCHEMA database, which you cannot easily do with results from SHOW statements. See Chapter 26, INFORMATION_SCHEMA Tables.
What is absent from this paragraph is any mention of treating the results of SHOW commands like the results of SELECT queries in other contexts. There is no support for setting a variable to the result of a SHOW command, or using INTO, or running SHOW in a subquery.
So you can capture the result returned by a SHOW command in a client programming language (Java, Python, PHP, etc.), and I suggest you do this.
In theory, all the information used by SHOW CREATE TABLE is accessible in the INFORMATION_SCHEMA tables (mostly TABLES and COLUMNS), but formatting a complete CREATE TABLE statement is a non-trivial exercise, and I wouldn't attempt it. For one thing, there are new features in every release of MySQL, e.g. new data types and table options, etc. So even if you could come up with the right query to produce this output, in a couple of years it would be out of date and it would be a thankless code maintenance chore to update it.
The closest solution I can think of, in pure MySQL, is to regularly clone the table structure (no data), like so:
CREATE TABLE backup_20220618 LIKE my_table;
As far as I know, to get your hands on the full explicit CREATE TABLE statement, as a string, would require the use of an external tool like mysqldump which was designed specifically for that purpose.
I have a table in mysql with geometry data in one of the columns. The datatype is text and I need to save it as Polygon geometry.
I have tried a few solutions, but keep running into Invalid GIS data provided to function st_polygonfromtext. error.
Here's some data to work with and an example:
https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=78ac63e16ccb5b1e4012c21809cba5ff
Table has 25k rows, there are likely some bad geometries in there. When I attempt to update on a subset of rows, it seems to successfully work, like it did in the fiddle example. It fails when I attempt to update all 25k rows.
Someone suggested using wrapping the statements around TRY and CATCH. Detecting faulty geometry WKT and returning the faulty record
I am not too familiar with using them in MySQL or stored procedures either.
I need a spatial index on the table to be able to use spatial functions and filter queries by location.
Plan A: Create a new table and try to convert as you INSERT IGNORE INTO that table from your existing table. I don't know if this will apply the "IGNORE" to conversion failures. Also, you would end up with the "good" values. What do you want to do about the "bad" values?
Plan B: Write a loop in application code -- read one row, convert the varchar value, check for errors.
I have problem with MS Access 2007 table connected via ODBC to MySQL server (not Microsoft SQL Server).
If unique identifier in MySQL table is BIGINT - all cells content is displayed like this: "#Deleted".
I have found this article:
"#Deleted" errors with linked ODBC tables (at support.microsoft.com)
and it says:
The following are some strategies that you can use to avoid this
behavior:
Avoid entering records that are exactly the same except for the unique index.
Avoid an update that triggers updates of both the unique index and another field.
Do not use a Float field as a unique index or as part of a unique index because of the inherent rounding problems of this data type.
Do all the updates and inserts by using SQL pass-through queries so that you know exactly what is sent to the ODBC data source.
Retrieve records with an SQL pass-through query. An SQL pass-through query is not updateable, and therefore does not cause
"#Delete" errors.
Avoid storing Null values within any field making up the unique index of your linked ODBC table.
but I don't have any of these things "to avoid". My problem is in BIGINT. To make sure if this is it I created 2 tables, one with INT id, one with BIGINT. And this is it.
I can't change BIGINT to INT in my production database.
Is there any way to fix this?
Im using: Access 2007, mysql-connector-odbc-3.51.30-winx64, MySQL server 5.1.73.
You can try basing the form on an Access query, and converting the BIGINT to an INT using CInt() in the query. This happens before the form processing. Depending on your circumstance, you may need to convert to a string (CStr()) in the Query, and then manually handle validating a user has entered a number using IsNumeric. The idea is to trick the form into not trying to interpret the datatype, which seems to be your problem.
Access 2016 now supports BigInt: https://blogs.office.com/2017/03/06/new-in-access-2016-large-number-bigint-support/
It's 2019 and with the latest ODBC driver from Oracle (v 8.0.17) and Access 365 (v 16.0.11904), the problem still occurs.
When the ODBC "Treat BIGINT columns as INT columns" is ticked and in Access support for Bigint is enable in options, the Linked tables with Bigint #id columns (the primary key) shows as deleted. Ruby creates these by default, so we are loathe to fiddle with that.
If we disable the above two option, Access thinks the #id column bigint is a string and shows the data. But then the field type is not bigint or int anymore.
This is quite pathetic, since this problem is almost 10 years old now.
The MySQL driver has an option to convert BIGINT values to INT. Would this solve the issue for you?
Procedure Analyse() suggests the optimal field for my columns. I want to create a new table with optimal field types starting from a table that I already have. At this time I'm running
SELECT * FROM mytable PROCEDURE ANALYSE();
Then I copy the report and manually I write the create statement. Is there a way to do that automatically? Is it more efficient to alter a table with new field types or create a new empty table with optimal field types and re-import data?
In truth you would not want to blindly accept the data types returned by this Analysis as you would / could never be sure what data types were "suggested". This Procedure returns "Suggested" optimal data types that "May" help reduce data storage requirements. The return values also depend on the data in the table you're selecting and could possibly change each time you run this query on new data.
Read more here on dev.mysql
But if you wanted to try something, I would start by building a Procedure of my own that could pass the returned data types recommended into a dynamically created DDL statement that you would need to check for possible incorrect datatypes and then execute the resulting DDL. It might take a little working out in terms of your code but you really should read more on Procedure Analyze()
I have a dataset with a lot of columns I want to import into a MySQL database, so I want to be able to create tables without specifying the column headers by hand. Rather I want to supply a filename with the column labels in it to (presumably) the MySQL CREATE TABLE command. I'm using standard MySQL Query Browser tools in Ubuntu, but I didn't see in option for this in the create table dialog, nor could I figure out how to write a query to do this from the CREATE TABLE documentation page. But there must be a way...
A CREATE TABLE statement includes more than just column names
Table name*
Column names*
Column data types*
Column constraints, like NOT NULL
Column options, like DEFAULT, character set
Table constraints, like PRIMARY KEY* and FOREIGN KEY
Indexes
Table options, like storage engine, default character set
* mandatory
You can't get all this just from a list of column names. You should write the CREATE TABLE statement yourself.
Re your comment: Many software development frameworks support ways to declare tables without using SQL DDL. E.g. Hibernate uses XML files. YAML is supported by Rails ActiveRecord, PHP Doctrine and Perl's SQLFairy. There are probably other tools that use other format such as JSON, but I don't know one offhand.
But eventually, all these "simplified" interfaces are no less complex to learn as SQL, while failing to represent exactly what SQL does. See also The Law of Leaky Abstractions.
Check out SQLFairy, because that tool might already convert from files to SQL in a way that can help you. And FWIW MySQL Query Browser (or under its current name, MySQL Workbench) can read SQL files. So you probably don't have to copy & paste manually.