AWS Data Pipeline - SQLActivity into S3 (MySQL DB) - mysql

What have I tried
As you can see here, this is my configuration on trying to save the output data from query to s3 as CSV. The query was successful (from what I've check on the logs) but it seems that it doesn't create any file on the s3 output location.
Main Problem
I know about the CopyActivity and I tried that successfully, but the problem is that my query is too long that it reaches the limit 10240. From what I've searched, you can save the SQL file on S3 then use SQL Activity to add Script URI and point it on the SQL file saved on S3. But it seems that only CopyActivity can create and save CSV as an output on S3.
Questions
Is there any work around on long query to still use Copy Activity?
Is there a way to use the SQL Activity and create CSV file on S3?
Is it breaking down the SQL query will be the only solution?

Related

Can't download CSV data from AWS Athena in usable format

I've got 70,000+ CSV files in an S3 bucket. They all have the same headers. I would like to combine the files into one CSV, which I want to download onto my machine.
Using AWS Athena, I seem to be most of the way there. I have created a database from the S3 bucket. I can then run queries like this:
select * from my_table_name limit 100
And see the results of the query (which in my case is combining many CSVs from S3) in the Athena console.
However when I go to "Download results" of that query, I can't open the CSV in Excel (or a text editor).
Doing
file -b my_table_name.csv
returns data.
I'm confused because I can visually see the results of my Athena query but can't download them in a usable file format. Am I missing something obvious for how to download this data? Why isn't it giving me a normal (perhaps UTF-8) CSV?
In Athena settings, I had encryption on. That solved it.

I want to compare the data I have in csv file to the data which is in ldap produciton server

I want to compare the data I have in csv file to the data which is in ldap produciton server.
There are thousands of users data in csv file and i want to compare the data with the data in production server.
Let's suppose I have user ID xtz12345 in the csv file with uid number 123456. Now I want to cross check the uidNumber of the same user ID xtz12345 in the production server.
Is there any way I can automate this? There are thousands of UserID to be checked and if i do it manually it probably gonna take a lot of time. Can anyone suggest what should I do?
Powershell script is good start place.
import activedirectory module (assuming Windows ADdownload and install RSAT tools, here) in Powershell to fetch information from AD, example
use import-csv in powershell to read csv values. Now, compare first with second. example
Happy to help

How to upload XML to MySQL in react with axios and nodejs

I am trying to upload a XML file to MySQl server.
I have a React web app, I am using Axios and NodeJS.
I was using the follwing statement to import the xml file to the product table directly from the workbench
LOAD XML INFILE "C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/products.xml" INTO TABLE product ROWS IDENTIFIED BY <Product>;
It worked fine.
Now I want to have a button that will upload anew xml file and replace the existing data in the table.
What I tried so far is using the HTML input file element, grabing the file from the event.target.files[0] and sending the file object to the server with a POST request.
I am not realy sure how to go from here I cant find a statement that can import the data out of the file object and imoprt it into the sql table.
any ideas? what is the best way to go about it?
I figured out my problem, my site was deployed to Heroku.
Apparently the clearDB, Heroku's add-on sql database, does not allow the use of LOAD XML INFILE / LOAD DATA INFILE as said here - https://getsatisfaction.com/cleardb/topics/load-data-local-infile.
What I ended up doing was converting the xml file to JS object.
That solution presented a new problem, my xml file was around 3MB summed up to over 12000 rows to insert to the database.
MySQL does not allow inserting more then a 1000 rows in a single query.
I had to split the object to several chuncks and loop through them uploading each one by it self.
This process takes some time to execute and I am sure there are better ways of doing it.
If anyone can shed some light on how best to go about it or provide an alternative I would apprreciate it.

What is a good way to format a CSV file to be crawled in AWS Glue?

I currently have a CSV file with 200k lines that look like this
id,path,username,folderid
32423423424,asfasf-232-3,cooluser,234324-234-34324-424
When the crawler finishes, it does say it created the table and I can see the table details. When I try to preview the data in athena it returns zero records. The CSV file is stored in a S3 bucket, all permissions are correct.
Thanks.
Try keeping the file inside a particular folder and point the crawler towards the folder. I think this would work rather than pointing towards a particular file.

Big Query table to be extracted as JSON in Local machine

I have an idea on how to extract Table data to Cloud storage using Bq extract command but I would like rather like to know, if there are any options to extract a Big Query table as NewLine Delimited JSON to Local Machine?
I could extract Table data to GCS via CLI and also download JSON data from WEB UI but I am looking for solution using BQ CLI to download table data as JSON in Local machine?. I am wondering is that even possible?
You need to use Google Cloud Storage for your export job. Exporting data from BigQuery is explained here, check also the variants for different path syntaxes.
Then you can download the files from GCS to your local storage.
Gsutil tool can help you further to download the file from GCS to local machine.
You first need to export to GCS, then to transfer to local machine.
If you use the BQ Cli tool, then you can set output format to JSON, and you can redirect to a file. This way you can achieve some export locally, but it has certain other limits.
this exports the first 1000 line as JSON
bq --format=prettyjson query --n=1000 "SELECT * from publicdata:samples.shakespeare" > export.json
It's possible to extract data without using GCS, directly to your local machine, using BQ CLI.
Please see my other answer for details: BigQuery Table Data Export