Is there a way to load MongoDB snappy data file into MySQL? - mysql

I have a dataset which is exported from MongoDB and compressed as snappy file and want to load the data into MySQL.
I have searched a lot but couldn't find a solution. So is there a way to load the snappy data from MongoDB into MySQL?

Related

Convert thousands of small JSON files from s3 to one big CSV in lambda

I am trying to merge multiple small JSON files (about 500,000 files of 400-500 byte size and are no longer susceptible to change) into one big CSV file, using AWS Lambda. I have a job that works something like this:
Use the s3.listobjects() to fetch keys
Use s3.getObject() to fetch each JSON file (is there a better way to do this?)
Create a CSV file in-memory (what's the best way to do this in nodejs?)
Upload that file in S3
I'd love to know if there's a better way to go about doing this. Thanks!
I would recommend using Amazon Athena.
It allows you to run SQL commands across multiple data files simultaneously (including JSON) and can create output files by Creating a Table from Query Results (CTAS) - Amazon Athena.

Big ( 1GB) JSON data handling in Tableau

I am working with a large twitter dataset in the form of a JSON file. When I try to import that into Tableau, there is an error and the upload fails on the account of data upload limit of 128 MB.
Due to which I need to shrink the dataset to bring it to 128MB thereby reducing the effectiveness of the analysis.
What is the best way to upload and handle large JSON data in tableau?
Do I need to use an external tool for it?
Can we use AWS products to handle the same? Please advise!
From what I can find in unofficial documents online, Tableau does indeed have a 128 MB limit on JSON file size. You have several options.
Split the JSON files into multiple files and union them in your data source (https://onlinehelp.tableau.com/current/pro/desktop/en-us/examples_json.html#Union)
Use a tool to convert the JSON to csv or Excel (Google for JSON to csv converter)
Load the JSON into a database, such as MySql and use the MySql as the data source
You may want to consider posting in the Ideas section of the Tableau Community pages and add a suggestion for allowing larger JSON files. This will bring it to the attention of the broader Tableau community and product management.

Converting to parquet file format while load to hive tables [duplicate]

This question already has an answer here:
Is it possible to load parquet table directly from file?
(1 answer)
Closed 7 years ago.
We want to do real time replication from mysql to hdfs with the files being stored as the parquet format in the hdfs cluster.
As for as we know ,we can do this using either
1)tungsten replicator or
2)Mysql server supports live replication to hdfs.
But our problem is that none of them support conversion to parquet while loading data to hdfs.
so just wanted to know whether is there any way to do real time replication with the file being stored as parquet in hdfs cluster.
Second question is that when you load csv file in hive tables using "LOAD DATA INPATH" and if the table has been define as Parquet file format ,will hive convert the file to parquet format or we need to write a utility to convert the file to parquet format and then load.
Second question : The CREATE TABLE statement should specify the Parquet storage format with syntax.
it all boils down to the version of Hive . some version do not support parquet file

How to import the data from .MYD into MATLAB?

I just obtained a bunch of MySQL data stored in raw MySQL ( MyISAM table) format in a .MYD file.
I now wish to start data analysis over those data. All I need to do is just reading the numbers into my MATLAB and process them.
What is the easiest way of doing so? I am using Mac OS, by the way.
Creating a mysql database and dropping the file into a (non running at the time) mysql server is certainly one way to get to the stage where you have the data in a form you can re-export it.
I am not familiar with MACos locations but in linux the data structure is :
/var/lib/mysql/databasename/*.MYI and MYD
I would be leery of trying to extract an ISAM file using anything other than mysql frankly.
Maybe someone else knows better, but I don't :-)

Loading large CSV file to MySQL with Django and transformations

I have a large CSV file (5.4GB) of data. It's a table with 6 columns a lot of rows. I want to import it into MySQL across several tables. Additionally I have to do some transformations to the data before import (e.g. parse a cell, and input the parts into several table values etc.). Now I can either do a script does a transformation and inserts a row at a time but it will take weeks to import the data. I know there is the LOAD DATA INFILE for MySQL but I am not sure how or if I can do the needed transformations in SQL.
Any advice how to proceed?
In my limited experience you won't want to use the Django ORM for something like this. It will be far too slow. I would write a Python script to operate on the CSV file, using Python's csv library. And then use the native MySQL facility LOAD DATA INFILE to load the data.
If the Python script to massage the CSV file is too slow you may consider writing that part in C or C++, assuming you can find a decent CSV library for those languages.