Perhaps someone could provide some insight into a problem I have.
I have a SQL Server database which receives information every hour and is updated from a stored procedure using a Bulk Insert. This all works fine, however the end result is to pull this information into Excel.
Establishing the data connection worked fine as well, until I attempted some calculations. The imported data is all formatted as text. Excel's number formats aren't working so I decided looking at the table in the database.
All the columns are set to varchar for the Bulk Insert to work so I changed a few to numeric. Refreshed in Excel and the calculations worked.
After repeat attempts I've not been able to get the Bulk Insert to work, even generating a format file with bcp it still returned errors on the insert. Could not convert varchar to numerical, after some further searching it was only failing on one numerical column which is generally empty.
Other than importing the data with VBA and converting it like that or adding zero to every imported value so Excel converts it.
Any suggestions are welcome.
Thanks!
Thanks for the replies I had considered using =value() in Excel but wanted to try avoid the additional formulas.
I was eventually able to resolve my problem by generating a format file for the Bulk Insert using the bcp utility. Though getting it to generte a file proved tricky enough below is an example of how I generated it.
At an elevated cmd:
C:\>bcp databasename.dbo.tablename format nul -c -x -f outputformatfile.xml -t, -S localhost\SQLINSTANCE -T
This generated an xml format file for the specific table. As my table had two additional columns which weren't in the source data I edited the XML and removed them. They were uniqueid and getdate columns.
Then I changed the Bulk Insert statement so it used the format file:
BULK INSERT [database].[dbo].[tablename]
FROM 'C:\bulkinsertdata.txt'
WITH (FORMATFILE='C:\outputformatfile.xml',FIRSTROW=3)
Using this method I was able to use the numeric and int datatypes successfully. Going back to Excel when the data connection was refreshed it was able to determine the correct datatypes.
Hope that helps someone!
Related
I'm using a database with MySQL 5.7, and sometimes, data needs to be updated using a mixture of scripts and manual editing. Because people working with the database are usually not familiar with SQL, I'd like to export the data as a TSV, which then could be manipulated (for example with Python's pandas module) and then be imported back. I assume the standard way would be to directly connect to the database, but using TSVs has some upsides in this situation, I think. I've been reading the MySQL docs and some stackoverflow questions to find the best way to do this. I've found a couple of solutions, however, they all are somewhat inconvenient. I will list them below and explain my problems with them.
My question is: did I miss something, for example some helpful SQL commands or CLI options to help with this? Or are the solutions I found already the best when importing/exporting TSVs?
My example database looks like this:
Database: Export_test
Table: Sample
Field
Type
Null
Key
id
int(11)
NO
PRI
text_data
text
NO
optional
int(11)
YES
time
timestamp
NO
Example data:
INSERT INTO `Sample` VALUES (1,'first line\\\nsecond line',NULL,'2022-02-16 20:17:38');
The data contains an escaped newline, which caused a lot of problems for me when exporting.
Table: Reference
Field
Type
Null
Key
id
int(11)
NO
PRI
foreign_key
int(11)
NO
MUL
Example data:
INSERT INTO `Reference` VALUES (1,1);
foreign_key is referencing a Sample.id.
Note about encoding: As a caveat for people trying to do the same thing: If you want to export/import data, make sure that characters sets and collations are set up correctly for connections. This caused me some headache, because although the data itself is utf8mb4, the client, server and connection character sets were latin1, which caused some loss of data in some instances.
Export
So, for exporting, I found basically three solutions, and they all behave somewhat differently:
A: SELECT stdout redirection
mysql Export_test -e "SELECT * FROM Sample;" > out.tsv
Output:
id text_data optional time
1 first line\\\nsecond line NULL 2022-02-16 21:26:13
Pros:
headers are added, which makes it easy to use with external programs
formatting works as intended
Cons:
NULL is used for null values; when importing, \N is required instead; as far as I know, this can't be configured for exports
Workaround: replace NULL values when editing the data
B: SELECT INTO OUTFILE
mysql Export_test -e "SELECT * FROM Sample INTO OUTFILE '/tmp/out.tsv';"
Output:
1 first line\\\
second line \N 2022-02-16 21:26:13
Pros:
\N is used for null data
Cons:
escaped linebreaks are not handled correctly
headers are missing
file writing permission issues
Workaround: fix linebreaks manually; add headers by hand or supply them in the script; use /tmp/ as output directory
C: mysqldump with --tab (performs SELECT INTO OUTFILE behind the scenes)
mysqldump --tab='/tmp/' --skip-tz-utc Export_test Sample
Output, pros and cons: same as export variant B
Something that should be noted: the output is only the same as B, if --skip-tz-utc is used; otherwise, timestamps will be converted to UTC, and will be off after importing the data.
Import
Something I didn't realize it first, is that it's impossible to merely update data directly with LOAD INTO or mysqlimport, although that's something many GUI tools appear to be doing and other people attempted. For me as an beginner, this wasn't immediately clear from the MySQL docs. A workaround appears to be creating an empty table, import the data there and then updating the actual table of interest via a join. I also thought one could update individual columns with this, which again is not possible. If there are some other ways to achieve this, I would really like to know.
As far as I could tell, there are two options, which do pretty much the same thing:
LOAD INTO:
mysql Export_test -e "SET FOREIGN_KEY_CHECKS = 0; LOAD DATA INFILE '/tmp/Sample.tsv' REPLACE INTO TABLE Sample IGNORE 1 LINES; SET FOREIGN_KEY_CHECKS = 1;"
mysqlimport (performs LOAD INTO behind the scenes):
mysqlimport --replace Export_test /tmp/Sample.tsv
Notice: if there are foreign key constraints like in this example, SET FOREIGN_KEY_CHECKS = 0; needs to be performed (as far as I can tell, mysqlimport can't be directly used in these cases). Also, IGNORE 1 LINES or --ignore-lines can be used to skip the first line if the input TSV contains a header. For mysqlimport, the name of the input file without extension must be the name of the table. Again, file reading permissions can be an issue, and /tmp/ is used to avoid that.
Are there ways to make this process more convenient? Like, are there some options I can use to avoid the manual workarounds, or are there ways to use TSV importing to UPDATE entries without creating a temporary table?
What I ended up doing was using LOAD INTO OUTFILE for exporting, added a header manually and also fixed the malformed lines by hand. After manipulating the data, I used LOAD DATA INTO to update the data. In another case, I exported with SELECT to stdout redirection, manipulated the data and then added a script, which just created a file with a bunch of UPDATE ... WHERE statements with the corresponding data. Then I ran the resulting .sql in my database. Is the latter maybe the best option in this case?
Exporting and importing is indeed sort of clunky in MySQL.
One problem is that it introduces a race condition. What if you export data to work on it, then someone modifies the data in the database, then you import your modified data, overwriting your friend's recent changes?
If you say, "no one is allowed to change data until you re-import the data," that could cause an unacceptably long time where clients are blocked, if the table is large.
The trend is that people want the database to minimize downtime, and ideally to have no downtime at all. Advancements in database tools are generally made with this priority in mind, not so much to accommodate your workflow of taking the data out of MySQL for transformations.
Also what if the database is large enough that the exported data causes a problem because where do you store a 500GB TSV file? Does pandas even work on such a large file?
What most people do is modify data while it remains in the database. They use in-place UPDATE statements to modify data. If they can't do this in one pass (there's a practical limit of 4GB for a binary log event, for example), then they UPDATE more modest-size subsets of rows, looping until they have transformed the data on all rows of a given table.
I convert excel to csv first, then import to phpmyadmin only import 100 rows, I changed config.inc buffer size but still did not changed the result. Could you please help me ???
My main idea to do this, compare two tables on mysql workbench, I have one table already sql, i need excel to convert sql then i can use "compare schemas" creating EER Model of existing database.
Good you described the purpose of this approach. This way I can tell you in advance that it will not help to convert that Excel data to a MySQL table.
The model features (sync, compare etc.) all work on meta data only. They do not consider any table content. So instead you should do a textual comparison, by converting the table you have in the server to CSV.
Comparing such large documents is however a challenge. If you only have a few changes then using a diff tool (visual like Araxis Merge or diff on the command line) may help. For larger changesets a small utility app (may self written) might be necessary.
I've been searching for a quick way to do this after my first few thoughts have failed me, but I haven't found anything.
My Issue
I'm importing raw client data into an Access database where the flat file they provide is parsed and converted into a standardized format for our organization. I do this for all of our clients, but this particular client's software gives us a file that puts "(NULL)" in every field that should be NULL. lol as a result, I have a ton of strings rather than a null field!
My goal is to do a data cleanse of the entire TABLE, rather than perform the cleanse at the FIELD level (as I do in my temporary solution below).
Data Cleanse
Temporary Solution:
I can't add those strings to our datawarehouse, so for now, I just have a query with an IIF statement check that replaces "(NULL)" with "" for each field (which took awhile to setup since the client file has roughly 96 fields). This works. However, we work with hundreds of clients, so I'd like to make a scale-able solution that doesn't require many changes if another client has a similar file; not to mention that if this client changes something in their file, I might have to redo my field specific statements.
Long-term Solution:
My first thought was an UPDATE query. I was hoping I could do something like:
UPDATE [ImportedRaw_T]
SET [ImportedRaw_T].* = ""
WHERE ((([ImportedRaw_T].* = "(NULL)"));
This would be easily scale-able, since for further clients I'd only need to change the table name and replace "(NULL)" with their particular default. Unfortunately, you can't use SELECT * with an update query.
Can anyone think of a work-around to the SELECT * issue for the update query, or have a better solution for cleansing an entire table, rather doing the cleanse at the field level?
SIDE NOTES
This conversion is 100% automated currently (Access is called via a watch folder batch), so anything requiring manual data manipulation / human intervention is out.
I've tried using a batch script to just cleanse the data in the .txt file before importing to Access - however, this caused an issue with the fixed-width format of the .txt, which has caused even larger issues with the automatic import of the file to Access. So I'd prefer to do this in Access if possible.
Any thoughts and suggestions are greatly appreciated. Thanks!
Unfortunately it's impossible to implement this in SQL using wildcards instead of column names, there is no such kind syntax.
I would suggest VBA solution, where you need to cycle thru all table fields and if field data type is string, generate and execute SQL UPDATE command for updating current field.
Also use Null instead of "", if you really need Nulls in the field instead of empty strings, they may work differently in calculations.
I have converted from a MySQL database to Postgres. During the conversion, the picture column in Postgres was created as bytea.
This Xojo code works in MySQL but not Postgres.
Dim mImage as Picture
mImage = rs.Field("Picture").PictureValue
Any ideas?
I don't know about this particular issue, but here's what you can do to find out yourself, perhaps:
Pictures are stored as BLOBs in the database. Now, this means that the column must also be declared as BLOB (or a similar binary type). If it was accidentally marked as TEXT, this would work as long as the database does not get exported by other means. I.e, as long as only your Xojo code reads and writes to the record, using the PictureValue functions, that takes care of keeping the data in BLOB form. But if you'd then convert to another database, the BLOB data would be read as text, and in that process it might get mangled.
So, it may be relevant to let us know how you converted the DB. Did you perform a export as SQL commands and then imported it into Postgres by running these commands again? Do you still have the export file? If so, find a record with picture data in it and see if that data is starting with: x' and then contains hex byte code, e.g. x'45FE1200... and so on. If it doesn't, that's another indicator for my suspicion.
So, check the type of the Picture column in your old DB first. If that specifies a binary data type, then the above probably does not apply.
Next, you can look at the actualy binary data that Xojo reads. To do that, get the BlobValue instead of the PictureValue, and store that in a MemoryBlock. Do the same for a single picture, both with the old and the new database. The memoryblock should contain the same bytes. If not, that would suggest that the data was not transferred correctly. Why? Well, that depends on how you converted it.
I am pulling data from a MySQL Server using an ODBC connection to the MySQL tables. Unfortunately, the table names are now being appended to the variables, which means I have to run an extra step after the data pull to rename every variable for use. Why is this happening and how can I prevent this?
E.G.,
MySQL table Sched has the following variables MAPID, TEST, TYPE, APPOINT, etc. but when I pull the data they come down as: Sched_MAPID, Sched_TEST, Sched_TYPE, Sched_APPOINT, etc.
I am using SAS 9.3 64b and setting a lib ref that includes the libname (MySQLLib, type of connection (odbc), the DSN, username and password. The connection obviously tests fine or I wouldn't be able to see the data, much less pull it. Basically, I set the data using the following:
libname MySQLLib odbc dsn=MySQLSSH user=&MySQLID pwd=&sqlpwd;
Data MySQLSched;
Set MySQLLib.Sched;
run;
Generally, a local SAS table (MySQLSched) is created that I can then manipulate and use as I need, without further touching the original Sched table. I still can, but every variable has the table name appended.
I am not sure if this is a SAS issue or a MySQL issue. I will also ask this same question in the SAS forums. IF I receive a pertinent answer there, I will update this post with that answer.
Any ideas?