How to load data in MySQL from txt file which has both field name and value in a row? - mysql

I am a newbie in MySQL but not in programming. I am trying to load a txt file with following data in MySQL:
{"id": "5bf573b81c5a1dcdd96ec666", "DisplayName": "Eureka", "NormalizedName": "eureka", "JournalId": "465895"}
{"id": "5bf573b81c5a1dcdd96ec667", "DisplayName": "The Artist and Journal of Home Culture", "NormalizedName": "the artist and journal of home culture", "JournalId": "1137746"}
{"id": "5bf573b81c5a1dcdd96ec668", "DisplayName": "Cumberland Law Review", "NormalizedName": "cumberland law review", "JournalId": "2978343"}
Each line is in second brackets, containing column names and quoted value separated by semicolon. I would like to have a table with these column names tabulating their respective values without any punctuation mark. For two column data file, I did it manually with the following code:
DROP TABLE IF EXISTS venue_linking_pairs;
CREATE TABLE venue_linking_pairs (mid VARCHAR(40), aid VARCHAR(40)); -- optimise the field length
LOAD DATA INFILE 'C:/ProgramData/MySQL/MySQL Server 8.0/Uploads/venue_linking_pairs.txt'
INTO TABLE venue_linking_pairs
FIELDS terminated by ','
(mid, aid)
SET
mid = SUBSTRING(mid, 10, 24) ,
aid = SUBSTRING(aid, 10, 24) ;
SELECT * FROM venue_linking_pairs ORDER BY mid LIMIT 10;
I would like do it more automatically e.g. setting the number of the table columns in MySQL equal to the txt data file, making their name same to the txt file one, removing the punctuation mark automatically without manually check the characters and use the substring function. I would be grateful for your help.

Related

Extract value from mysql json string stored as varchar

Have a mysql column with json like strings stored as varchar:
{'#type': 'Organization', 'legalName': 'some company inc.'}
I tried to extract it using the following:
SELECT JSON_EXTRACT(columnname, "$.legalName")
FROM tablename
WHERE indexfield='specifics'
But all i get is 'NULL' as output and a warning message " Current selection does not contain a unique column."
JSON has to have its names and values wrapped in DOUBLE QUOTES like this
{"#type": "Organization", "legalName": "some company inc."}
So
SELECT JSON_EXTRACT('{"#type": "Organization", "legalName": "some company inc."}', "$.legalName") as xxx;
Works
"some company inc."
While your example has single quotes
{'#type': 'Organization', 'legalName': 'some company inc.'}
So
SELECT JSON_EXTRACT("{'#type': 'Organization', 'legalName': 'some company inc.'}", "$.legalName") as xxx;
Does not work!
So it looks like the JSON you stored was not real JSON
None of the answers helped resolve. So what I did was exported the table to csv, loaded to google sheet ( after splitting it into manageable size) and than used 'text to column' to extract the value from the json-like string.
After that, imported the csv into a temp table, tested and brought back live.

missing data for column with sql COPY

I'm working on Jupyter notebook.
I have the following json file:
{"num_songs": 1, "artist_id": "ARJIE2Y1187B994AB7", "artist_latitude": null, "artist_longitude": null, "artist_location": "", "artist_name": "Line Renaud", "song_id": "SOUPIRU12A6D4FA1E1", "title": "Der Kleine Dompfaff", "duration": 152.92036, "year": 0}
I want to load it into a table using copy statement.
I've created my table like this:
song_table_create = (""" CREATE TABLE songs (song_id varchar, title varchar, artist_id varchar, year int, duration float)""")
Here my copy statement:
test_statement = """copy songs (song_id, title, artist_id, year, duration) from '/home/workspace/data/song_data/A/B/C/TRABCRU128F423F449.json';"""
cur.execute(test_statement)
when running it I got the following error
DataError: missing data for column "title"
CONTEXT: COPY songs, line 1: "{"num_songs": 1, "artist_id": "AR8IEZO1187B99055E", "artist_latitude": null, "artist_longitude": nul..."
What's wrong?
I had same issue few weeks back and first option helped me to resolve my issue.
I will check for the following options
Space character (one or many) have escaped with backslash at the end of unquoted value.
One or many lines of your JSON has 4 or fewer space characters means your delimiter.
This document link may help you to understand. COPY command

AWS Glue Crawler for JSONB column in PostgreSQL RDS

I've created a crawler that looks at a PostgreSQL 9.6 RDS table with a JSONB column but the crawler identifies the column type as "string". When I then try to create a job that loads data from a JSON file on S3 into the RDS table I get an error.
How can I map a JSON file source to a JSONB target column?
It's not quite a direct copy, but an approach that has worked for me is to define the column on the target table as TEXT. After the Glue job populates the field, I then convert it to JSONB. For example:
alter table postgres_table
alter column column_with_json set data type jsonb using column_with_json::jsonb;
Note the use of the cast for the existing text data. Without that, the alter column would fail.
Crawler will identify JSONB column type as "string" but you can try to use Unbox Class in Glue to convert this column to json
let's check the following table in PostgreSQL
create table persons (id integer, person_data jsonb, creation_date timestamp )
There is an example of one record from person table
ID = 1
PERSON_DATA = {
"firstName": "Sergii",
"age": 99,
"email":"Test#test.com"
}
CREATION_DATE = 2021-04-15 00:18:06
The following code need to be added in Glue
# 1. create dynamic frame from catalog
df_persons = glueContext.create_dynamic_frame.from_catalog(database = "testdb", table_name = "persons", transformation_ctx = "df_persons ")
# 2.in path you need to add your jsonb column name that need to be converted to json
df_persons_json = Unbox.apply(frame = df_persons , path = "person_data", format="json")
# 3. converting from dynamic frame to data frame
datf_persons_json = df_persons_json.toDF()
# 4. after that you can process this column as a json datatype or create dataframe with all necessary columns , each json data element can be added as a separate column in dataframe :
final_df_person = datf_persons_json.select("id","person_data.age","person_data.firstName","creation_date")
You can also check the following link:
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-transforms-Unbox.html

Importing CSV While Creating Relationship In Neo4j

I am trying to create a relationship between two different graphs, using information in a CSV file. I built the query the way I did because the size of each graph, one being 500k+ and the other 1.5m+.
This is the query I have:
LOAD CSV WITH HEADERS FROM "file:///customers_table.csv" AS row WITH row
MATCH (m:Main) WITH m
MATCH (c:Customers) USING INDEX c:Customers(customer)
WHERE m.ASIN = row.asin AND c.customer = row.customer
CREATE (c)-[:RATED]->(m)
This is the error I receive:
Variable `row` not defined (line 4, column 16 (offset: 164))
"WHERE m.ASIN = row.asin AND c.customer = row.customer"
^
An example of the Main table is:
{
"ASIN": "0827229534",
"totalreviews": "2",
"categories": "2",
"title": "Patterns of Preaching: A Sermon Sampler",
"avgrating": "5",
"group": "Book"
}
And an example of a customer is:
{
"customer": "A2FMUVHRO76A32"
}
And inside the customers table csv, I have:
Customer, ASIN, rating
A2FMUVHRO76A32, 0827229534, 5
I can't seem to figure out why it's throwing back that error.
The first WITH clause in your query (WITH row) is unnecessary, but you have to add the variable to the WITH clause. So this version compiles.
LOAD CSV WITH HEADERS FROM "file:///customers_table.csv" AS row
MATCH (m:Main)
WITH m, row
MATCH (c:Customers) USING INDEX c:Customers(customer)
WHERE m.ASIN = row.asin AND c.customer = row.customer
CREATE (c)-[:RATED]->(m)
The reason for this is, that, in essence, WITH chains two query parts together, while limiting the scope to its variables (and in some cases, also performing calculations, aggregations, etc.).
Having said that, you do not even need the second WITH clause, you can just omit it and even merge the two MATCH clauses to a single one:
LOAD CSV WITH HEADERS FROM "file:///customers_table.csv" AS row
MATCH (m:Main), (c:Customers) USING INDEX c:Customers(customer)
WHERE m.ASIN = row.asin AND c.customer = row.customer
CREATE (c)-[:RATED]->(m)

How to generate a psv file from apache drill

The current way I am going about creating a pipe seperated value (psv) file is to first create a view with a query like
Create view ABC as
select column 1 || '|' || column 2 || '|' || ..
And then use the !record to do a select * from ABC.
This is causing a lot of development time and error prone as the files that I need to generate have 100's of columns.
Is there a simple way of going about this?
In your Storage plugin create custom format.
Here is the documentation
https://drill.apache.org/docs/plugin-configuration-basics/
"formats": {
"psv": {
"type": "text",
"extensions": [
"tbl"
],
"delimiter": "|"
}
}
Alter you session to set your default store
alter session set `store.format`='psv';
Use CTAS to write the data in above specified format
create table `users.vgunnu`.`vt_del_test` as select * from dfs.root.`/tmp/test_parquet` limit 3;
More info for the format
http://drill.apache.org/docs/create-table-as-ctas-command/