Google BigQuery bulk load data into table - csv

I think that what I want to do is not feasible at the moment but want to clarify.
I have a bucket say bucketA with files served to the public and a bucket say bucketB where access logs of bucketA are stored in a specific CSV format
What I want to do is to run SQL queries to these access logs. The problem that I have is that the logs are stored in different CSVs (one per hour I think). I tried to import them through BigQuery UI interface but it seems that there is a one to one CSV to table mapping. When you define the input location the placeholder and documentation as you to put a gs://<bucket_name>/<path_to_input_file>.
Based on the above my question is: Is it possible to upload a all files in a bucket to a single BigQuery table, with something like an "*" asterisk operator?
Once the table is constructed what happens when more files with data get stored in the bucket? Do I need to re-run, is there a scheduler?

Based on the above my question is: Is it possible to upload a all
files in a bucket to a single BigQuery table, with something like an
"*" asterisk operator?
You can query them directly in GCS (federated source) or load then all into a native table using * in both cases:
Once the table is constructed what happens when more files with data get stored in the bucket? Do I need to re-run, is there a scheduler?
If you leave it as en external table, then each time you query BigQuery will scan all the files, so you'll get new files/data. If you load it as a native table, then you'll need to schedule a job yourself to append each new file to your table.

Using BigQuery web UI, after I have created the new table + some initial data with the standard upload csv method.
For quick testing, how to use BigQuery web UI to insert more new data into the existing table?
I realized I CANNOT copy and paste multiple insert statements in the Query editor textbox.
INSERT INTO dataset.myschema VALUES ('new value1', 'more value1');
INSERT INTO dataset.myschema VALUES ('new value2', 'more value2');
Wow, then it will be tedious to insert new line of data 1 by 1.
Luckily BigQuery supports INSERT statements that use VALUES syntax can insert multiple rows.
INSERT INTO dataset.myschema VALUES ('new value1', 'more value1'),
('new value2', 'more value2');

Related

Bigquery - INSERT into Existing table - from local CSV

In Bigquery - I want to create a table, then load the table from a csv file on my local drive in a single query.
I know the Statements below are not correct, looking for an exmaple of how to do it.
I can create the table, I am not able to insert, or is there another method (upsert, merge???)
CREATE OR REPLACE TABLE Project1.DataSet_Creation.tbl_Store_List_Full
( Store_Nbr string(255),Sister_Store string(255))
,
INSERT INTO Project1.DataSet_Creation.tbl_Store_List_Full (Store_Nbr,Sister_Store)
FROM C:\GCP_Transition\tbl_Store_List_Full.csv
AFAIK, for this purpose you need to use the Bigquery web UI, in a project tab click the create table and choose the CSV file as upload method, enable the auto detect if it is disabled and header rows to skip as 1 so that Bigquery will take your columns as proper of the CSV file with no title row as the docs suggest.
https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv#loading_csv_data_into_a_table

How to insert data into table using Sqlalchemy ORM?

I am trying to copy the data from one table to another table. Normally using the SELECT command, we can read the whole table, and using the INSERT command we can insert the data into another table. But I don't want to use raw SQL command, I want to use SQLAlchemy ORM to copy and insert. Is there any way to do it?
Are you just trying to add an entry to a database, or are you trying to duplicate an entry?
Adding would be done by simply doing:
ed_user = User(name='ed', fullname='Ed Jones', nickname='edsnickname')
session.add(ed_user)
session.commit()
The example was taken from the official documentation. The commit will actually write the data added to the session, to the database.
EDIT:
you'll have to write something that parses the file into objects and add those objects to the database. Depends on what kind of file, if it's a database export, then you can just import with your preferred database tool. You can have a look at this blog post as well. Bottom-line is that if you want to import from csv / excel / txt, you'll have to write something for it.

Individiaul MySQL INSERT statements vs writing to local CSV first and then LOAD DATA

I'm trying to extract information from 50 million HTML files into a MySQL database. My question is at what point during the process should I store the information into the MySQL database. For example, I'm considering these options:
Open each file and extract the information I need. Perform an INSERT after each file gets parsed.
Open each file and extract the information I need. Store the information into a CSV file as an intermediary. After all the files have been parsed into the CSV, perform a bulk upload using LOAD DATA INFILE
I know that LOAD DATA INFILE is much faster than individual INSERT statements if I already have the information in a CSV. However, if I don't have the information already in a CSV, I don't know if it's faster to create the CSV first.
At the crux of the question: Is writing to a local CSV faster or about the same as a single INSERT statement?
I'm using PHP in case it matters. Thanks in advance!
They key is not to do one insert per entry, but batch the entries in memory then perform a batch insert.
See: https://dev.mysql.com/doc/refman/5.7/en/insert.html
INSERT statements that use VALUES syntax can insert multiple rows. To do this, include multiple lists of column values, each enclosed within parentheses and separated by commas. Example:
INSERT INTO tbl_name (a,b,c) VALUES(1,2,3),(4,5,6),(7,8,9);
ORMs like SQLAlchemy or Hibernate are smart enough (depending on configuration) to automatically batch your inserts.

Load xml data into sql database using phpmyadmin

I know this is a really basic question but I am struggling on my first import of data from an xml file. I have created the table "Regions" which has just two columns - ID and Name. The xml file contains the same column names.
In order to bulk import the data, I am using the following SQL command:
LOAD XML LOCAL INFILE 'C:\Users\Dell\Desktop\regions.xml'
INTO TABLE Regions (ID, Name)
but I am getting the error #1148 - The used command is not allowed with this MySQL version
Now having researched the internet, to allow this command requires a change in one of the command files but my service provider doesn't allow me access to it. Is there an alternative way to write the SQL code and do exactly the same thing as the code above which is basically just import the data from an xml file?
Many thanks
Since LOAD DATA INFILE isn't enabled for you, it appears you have only one more option and that's to create a set of INSERT statements for each row. If you converted your XML file to CSV using Excel, that's an easy step. Assuming you have a rows of data like this
A | B
-----|-------------------------
1 | Region 1
2 | Region 2
I would create a formula like this in column C
=CONCATENATE("INSERT INTO Regions(ID,Name) VALUES(",A1,",'",B1,"');")
This will result in INSERT INTO Regions(ID,Name) VALUES(1,'Region 1'); for your first row. File this down to the last row of your spreadsheet. Select all the insert statements and copy them into a Query text box inside PHPMyAdmin and you should be able to insert your values.
I've used this method many times when I needed to import data into a database.

Create MySQL table from xls spreadsheet

I wonder if there is a (native) possibility to create a MySQL table from an .xls or .xlsx spreadsheet. Note that I do not want to import a file into an existing table with LOAD DATA INFILE or INSERT INTO, but to create the table from scratch. i.e using the header as columns (with some default field type e.g. INT) and then insert the data in one step.
So far I used a python script to build a create statement and imported the file afterwards, but somehow I feel clumsy with that approach.
There is no native MySQL tool that does this, but the MySQL PROCEDURE ANALYSE might help you suggest the correct column types.
With a VB Script you could do that. At my client we have a script which takes the worksheet name, the heading names and the field formats and generates a SQL script containing a CREATE TABLE and a the INSERT INTO statements. We use Oracle but mySQL is the same principle.
Of course you could do it even more sophisticated by accessing mySQL from Excel by ODBC and post the CREATE TABLE and INSERT INTO statements that way.
I cannot provide you with the script as it is the belonging of my client but I can answer your questions on how to write such a script if you want to write one.