Search Replace in MySQL: remove directory structure but keep filename - mysql

I am changing directory structures in a Drupal installation and need to remove all path data except the file name itself.
So the basic structure is:
+-------------+--------------+---------+-----------+-------------+----------+-------+----------------------------------------------------------------------------------+-----------------------+
| entity_type | bundle | deleted | entity_id | revision_id | language | delta | field_filename_value | field_filename_format |
+-------------+--------------+---------+-----------+-------------+----------+-------+----------------------------------------------------------------------------------+-----------------------+
The filename is stored in field_filename_value. Here's a sample record:
+-------------+--------------+---------+-----------+-------------+----------+-------+----------------------------------------------------------------------------------+-----------------------+
| entity_type | bundle | deleted | entity_id | revision_id | language | delta | field_filename_value | field_filename_format |
+-------------+--------------+---------+-----------+-------------+----------+-------+----------------------------------------------------------------------------------+-----------------------+
| node | presentation | 0 | 11 | 11 | und | 0 | /really long path name/with lots of words/167 Clarence Ashley - Coo Coo Bird.mp3 | NULL |
+-------------+--------------+---------+-----------+-------------+----------+-------+----------------------------------------------------------------------------------+-----------------------+
That ridiculous filename value needs to be changed from:
/really long path name/with lots of words/167 Clarence Ashley - Coo Coo Bird.mp3
To this:
167 Clarence Ashley - Coo Coo Bird.mp3
Setting aside the bad practice of using spaces in file/directory names, how would you correct this? Is it possible using MySQL features alone?
As an added challenge, some files may be more than 2 directories deep.

Use substring_index
select substring_index('http://www.example.com/dev/archive/examples/test.htm','/',-1)
(both above are fully from
MySQL String Last Index Of
How you would use it is easy, but just to explain, you select the last index of the / and then do another substring function to cut off anything to the left of it

Related

Implementing an enrichment using Spark with MySQL is bad idea?

I am trying to build one giant schema that makes data users to query easier, in order to achieve that, streaming events have to be joined with User Metadata by USER_ID and ID. In data engineering, This operation is called "Data Enrichment" right? the tables below are the example.
# `Event` (Stream)
+---------+--------------+---------------------+
| UERR_ID | EVENT | TIMESTAMP |
+---------+--------------+---------------------+
| 1 | page_view | 2020-04-10T12:00:11 |
| 2 | button_click | 2020-04-10T12:01:23 |
| 3 | page_view | 2020-04-10T12:01:44 |
+---------+--------------+---------------------+
# `User Metadata` (Static)
+----+-------+--------+
| ID | NAME | GENDER |
+----+-------+--------+
| 1 | Matt | MALE |
| 2 | John | MALE |
| 3 | Alice | FEMALE |
+----+-------+--------+
==> # Result
+---------+--------------+---------------------+-------+--------+
| UERR_ID | EVENT | TIMESTAMP | NAME | GENDER |
+---------+--------------+---------------------+-------+--------+
| 1 | page_view | 2020-04-10T12:00:11 | Matt | MALE |
| 2 | button_click | 2020-04-10T12:01:23 | John | MALE |
| 3 | page_view | 2020-04-10T12:01:44 | Alice | FEMALE |
+---------+--------------+---------------------+-------+--------+
I was developing this using Spark, and User Metadata is stored in MySQL, then I realized it would be waste of parallelism of Spark if the spark code includes joining with MySQL tables right?
The bottleneck will be happening on MySQL if traffic will be increased I guess..
Should I store those table to key-value store and update it periodically?
Can you give me some idea to tackle this problem? How you usually handle this type of operations?
Solution 1 :
As you suggested you can keep a local cache copy of in key-value pair on your local and updated the cache as regular interval.
Solution 2 :
You can use a MySql to Kafka Connector as below,
https://debezium.io/documentation/reference/1.1/connectors/mysql.html
For every DML or table alter operations on your User Metadata Table there will be a respective event fired to a Kafka topic (e.g. db_events). You can run a thread in parallel in your Spark streaming job which polls db_events and updates your local cache key-value.
This solution would make your application a near-real time application in true sense.
One over head I can see is that there will be need to run a Kafka Connect service with Mysql Connector (i.e. Debezium) as a plugin.

Can you use "knight moves" to query a single database and join it to itself?

I am working on a project that involves code in both Prolog and SQL to solve the same problem. A problem I've run across is I can't use a single database to form a hierarchy. In this list of prolog facts you can see that the "assembly" parts are related to each other.
basicpart(spoke).
basicpart(rearframe).
basicpart(handles).
basicpart(gears).
basicpart(bolt).
basicpart(nut).
basicpart(fork).
assembly(bike,[wheel,wheel,frame]).
assembly(wheel,[spoke,rim,hub]).
assembly(frame,[rearframe,frontframe]).
assembly(frontframe,[fork,handles]).
assembly(hub,[gears,axle]).
assembly(axle,[bolt,nut]).
If I put all of these "assembly" definitions into one SQL database, can I use knight moves (joining a table to itself on 2 different columns in it) to build this hierarchy in SQL in only 2 tables?
If I understand that question correctly. You cannot construct your bike with just one query (I'm not familiar with the term "knight moves"). In fact, you can -- but it must be a recursive SQL query. Because you will be computing the transitive closure of the part-subpart relationship.
Unfortunately I don't immediately know how to write these. SQL syntax is frankly abysmal and recursive SQL looks even abysmaller, so below is example code using a loop instead.
You actually need only one table to represent the data as the basicpart/1 relation does not bring anything to the table, except label certain "things" as basic. But these are also the things that do not appear in assembly/2 in the first position.
Notes:
Not using ENUMS which are not really "types" in MySQL/MariaDB but just a constraint on a field of a specific table. (Like, WTF!)
The multiset representation of the Prolog code ("a bike has two wheels") is flattened into multiple rows separately identified by a numeric surrogate id. This is due to the "First Normal Form" dogma of RDBMS practice. There is à priori nothing wrong with having multisets as values, if the query language and the RDBMS engine can support it. For example, you can have XML values in PostgreSQL complete with queries over its content, as I remember1.
DELIMITER //
DROP PROCEDURE IF EXISTS prepare;
CREATE PROCEDURE prepare()
BEGIN
DROP TABLE IF EXISTS assembly;
CREATE TABLE assembly
(id INT AUTO_INCREMENT KEY, -- surrogate key because a bike may have several wheels
part VARCHAR(10) NOT NULL,
subpart VARCHAR(10) NOT NULL);
INSERT INTO assembly(part,subpart) VALUES
("bike","wheel"),
("bike","wheel"),
("bike","frame"),
("wheel","spoke"),
("wheel","rim"),
("wheel","hub"),
("frame","rearframe"),
("frame","frontframe"),
("frontframe","fork"),
("frontframe","handles"),
("hub","gears"),
("hub","axle"),
("axle","bolt"),
("axle","nut");
END;
DROP PROCEDURE IF EXISTS compute_transitive_closure;
CREATE PROCEDURE compute_transitive_closure()
BEGIN
DROP TABLE IF EXISTS pieces;
CREATE TABLE pieces
(id INT AUTO_INCREMENT KEY,
part VARCHAR(10) NOT NULL,
subpart VARCHAR(10) NOT NULL,
path VARCHAR(500) NOT NULl DEFAULT "",
depth INT NOT NULL DEFAULT 0);
INSERT INTO pieces(part,subpart,path,depth) VALUES
("ROOT","bike","/bike",0);
SET #depth=0;
l: LOOP
INSERT INTO pieces(part,subpart,path,depth)
SELECT
p.subpart,
a.subpart,
CONCAT(p.path,'/',a.subpart),
#depth+1
FROM
pieces p,
assembly a
WHERE
p.depth = #depth AND p.subpart = a.part;
IF ROW_COUNT() <= 0 THEN
LEAVE l;
ELSE
SELECT * FROM pieces;
END IF;
SET #depth=#depth+1;
END LOOP;
END; //
DELIMITER ;
Put the above into a file SQL.txt, and then, in a database testme:
MariaDB [testme]> source SQL.txt;
MariaDB [testme]> CALL prepare;
MariaDB [testme]> CALL compute_transitive_closure;
Then after 4 passages through the loop, you get:
+----+------------+------------+--------------------------------+-------+
| id | part | subpart | path | depth |
+----+------------+------------+--------------------------------+-------+
| 1 | ROOT | bike | /bike | 0 |
| 2 | bike | wheel | /bike/wheel | 1 |
| 3 | bike | wheel | /bike/wheel | 1 |
| 4 | bike | frame | /bike/frame | 1 |
| 5 | wheel | spoke | /bike/wheel/spoke | 2 |
| 6 | wheel | spoke | /bike/wheel/spoke | 2 |
| 7 | wheel | rim | /bike/wheel/rim | 2 |
| 8 | wheel | rim | /bike/wheel/rim | 2 |
| 9 | wheel | hub | /bike/wheel/hub | 2 |
| 10 | wheel | hub | /bike/wheel/hub | 2 |
| 11 | frame | rearframe | /bike/frame/rearframe | 2 |
| 12 | frame | frontframe | /bike/frame/frontframe | 2 |
| 20 | frontframe | fork | /bike/frame/frontframe/fork | 3 |
| 21 | frontframe | handles | /bike/frame/frontframe/handles | 3 |
| 22 | hub | gears | /bike/wheel/hub/gears | 3 |
| 23 | hub | gears | /bike/wheel/hub/gears | 3 |
| 24 | hub | axle | /bike/wheel/hub/axle | 3 |
| 25 | hub | axle | /bike/wheel/hub/axle | 3 |
| 27 | axle | bolt | /bike/wheel/hub/axle/bolt | 4 |
| 28 | axle | nut | /bike/wheel/hub/axle/nut | 4 |
| 29 | axle | bolt | /bike/wheel/hub/axle/bolt | 4 |
| 30 | axle | nut | /bike/wheel/hub/axle/nut | 4 |
+----+------------+------------+--------------------------------+-------+
1: This made me dig out "Database in Depth: Relational Theory for Practitioners", O'Reilly 2005, by Chris Date, an excellent introduction to the relational model. On page 30, Date considers "sets as values" (but does not consider "multisets"):
Second (and regardless of what you might think of my first argument),
the fact is that a set like {P2,P4,P5} is no more and no less
decomposable by the DBMS than a character string is. Like character
strings, sets do have some inner structure; as with characters
strings, however, it's convenient to ignore that structure for certain
purposes. In other words, if a character string is compatible with the
requirements of 1NF - that is, if character strings are atomic - then
sets must be, too. The real point I'm getting at here is that the
notion of atomicity has no absolute meaning; it just depends on what
we want to do with the data. Sometimes we want to deal with an entire
set of part numbers as a single thing, and sometimes we want to deal
with individual part numbers within that set - but then we are
descending to a lower level of detail (a lower level of abstraction).

pyqt4 - MySQL How print single/multiple row(s) of a table in the TableViewWidget

I've recently tried to create an executable with python 2.7 which can read a MySQL database.
The database (named 'montre') regroups two tables : patient and proto_1
Here is the content of those tables :
mysql> select * from proto_1;
+----+------------+---------------------+-------------+-------------------+-----
----------+----------+
| id | Nom_Montre | Date_Heure | Temperature | Pulsion_cardiaque | Taux
_oxy_sang | Humidite |
+----+------------+---------------------+-------------+-------------------+-----
----------+----------+
| 1 | montre_1 | 2017-11-27 19:33:25 | 22.30 | NULL |
NULL | NULL |
| 2 | montre_1 | 2017-11-27 19:45:12 | 22.52 | NULL |
NULL | NULL |
+----+------------+---------------------+-------------+-------------------+-----
----------+----------+
mysql> select * from patient;
+----+-----------+--------+------+------+---------------------+------------+----
----------+
| id | nom | prenom | sexe | age | date_naissance | Nom_Montre | com
mentaires |
+----+-----------+--------+------+------+---------------------+------------+----
----------+
| 2 | RICHEMONT | Robert | M | 37 | 1980-04-05 23:43:00 | montre_3 | ess
aye2 |
| 3 | PIERRET | Mandy | F | 22 | 1995-04-05 10:43:00 | montre_4 | ess
aye3 |
| 14 | PIEKARZ | Allan | M | 22 | 1995-06-01 10:32:56 | montre_1 | Hea
lthy man |
+----+-----------+--------+------+------+---------------------+------------+----
----------+
As I'm just used to code in C (no OOP), I didn't create class in the python project (shame on me...). But I managed, in two files, to create something (with mysql.connector) which can print (on the cmd) my database and excecute sub like looking-for() etc.
Now, I want to create a GUI for users with pyqt. Unfortunately, I saw that the structure is totally different, with class etc. But okay, I tried to go throught this and I've created a GUI which allows to display the table "patient". But I didn't manage (in the datasheet of QT) to find how I can use the programs I've already created to display. Neither how to display in a tableWidget only several rows of my table patient for exemple (Using QSQL).
For example, if I want to display all the table patient, I use this line (pyQt):
self.model.setTable("patient")
For this one, I got it, but that disturb me because there is no MySQL coding requisites to display my table and so I don't know how to sort only the rows we want to see and display them. If we only want to see, for example, the ID n°2, how to display in the table:widget only Robert ?
To recap, I want to know :
If I can take the coding I've created and combine it with pyQT
How to display (tableWidget) only rows which are selected by MySQL. Is that possible ?
Please find in the URL my code for a better understanding of my problem :
https://drive.google.com/file/d/1nxufjJfF17P5hN__CBEcvrbuHF-23aHN/view?usp=sharing
I hope I was clear, thank you all for your help !

Extracting and Constructing Tables from HTML Files using Julia

Here's a public link to an example html file. I would like to extract each set of CAN and yearly tax information (example highlighted in red in the image below) from the file and construct a dataframe that looks like the one below.
Target Fields
Example DataFrame
| Row | CAN | Crtf_NoCrtf | Tax_Year | Land_Value | Improv_Value | Total_Value | Total_Tax |
|-----+--------------+-------------+----------+------------+--------------+-------------+-----------|
| 1 | 184750010210 | Yes | 2016 | 16720 | 148330 | 165050 | 4432.24 |
| 2 | 184750010210 | Yes | 2015 | 16720 | 128250 | 144970 | 3901.06 |
| 3 | 184750010210 | Yes | 2014 | 16720 | 109740 | 126460 | 3412.63 |
| 4 | 184750010210 | Yes | 2013 | 16720 | 111430 | 128150 | 3474.46 |
| 5 | 184750010210 | Yes | 2012 | 16720 | 99340 | 116060 | 3146.17 |
| 6 | 184750010210 | Yes | 2011 | 16720 | 102350 | 119070 | 3218.80 |
| 7 | 184750010210 | Yes | 2010 | 16720 | 108440 | 125160 | 3369.97 |
| 8 | 184750010210 | Yes | 2009 | 16720 | 113870 | 130590 | 3458.14 |
| 9 | 184750010210 | Yes | 2008 | 16720 | 122390 | 139110 | 3629.85 |
| 10 | 184750010210 | Yes | 2007 | 16720 | 112820 | 129540 | 3302.72 |
| 11 | 184750010210 | Yes | 2006 | 12380 | 112760 | | 3623.12 |
| 12 | 184750010210 | Yes | 2005 | 19800 | 107400 | | 3882.24 |
Additional Information
If it is not possible to insert the CAN to each row that is okay, I can export the CAN numbers separately and find a way to attach them to the dataframe containing the tax values. I have looked into using beautiful soup for python, but I am an absolute novice with python and the rest of the scripts I am writing are in Julia, so I would prefer to keep everything in one language.
Is there any way to achieve what I am trying to achieve? I have looked at Gumbo.jl but can not find any detailed documentation/tutorials.
So Gumbo.jl will parse the HTML and give you a programatic representation of the structure of the HTML file (called a DOM - Document Object Model). This is typically a tree of html tags, which you can traverse and extract the data you need.
To make this easier, what you really want is a way to query the DOM, so that you can extract the data you need without having to traverse the entire tree yourself. The Cascadia.jl project does this for you. It is built on top of Gumbo, and uses CSS selectors as the query language.
So for your example, you could use something like the following to extract all the CAN fields:
julia> using Gumbo
julia> using Cascadia
julia> h=parsehtml(read("/Users/aviks/Download/z1.html", String))
julia> c = matchall(Selector("td:containsOwn(\"CAN:\") + td span"), h.root)
13-element Array{Gumbo.HTMLNode,1}:
Gumbo.HTMLElement{:span}:
<span class="value">184750010210</span>
...
#print all the CAN values
julia> for x in c
println( x.children[1].text )
end
184750010210
186170040070
175630130020
172640020290
168330020230
156340030160
118210000020
190490040500
173480080430
161160010050
153510060090
050493000250
050470630910
Hopefully this gives you an idea of how to extract all the data you need.
The current answer is a bit out of date since the readall() function no longer exists. I'll update his answer below.
Here's a general breakdown of the package ecosystem for Julia (as of the time of writing this answer):
Requests.jl is used to download the HTML file itself (note that in avik's answer, he reads the HTML file from his local machine)
Cascadia.jl is required to search for CSS tags (e.g. the tag that you would find if you were to use Selector Gadget).
Gumbo.jl is required to parse the resulting HTML
The key thing to remember is that Gumbo stores objects in tree format as HTMLNodes or HTMLElements. So most objects have "parents" and "children." To get the data you need, it's simply a matter of filtering with the right selector (using Cascadia) and then going to the correct point in the Gumbo tree.
An updated version of avik's answer:
using Requests, Cascadia, Gumbo
# r = get(url) # Normally, you'd put a url here, but I couldn't find a way to grab it without having to download it and read it locally
# h = parsehtml(String(r.data)) # Then normally you'd execute this
# Instead, I'm going to read in the html file as a string and give it to Gumbo
h = parsehtml(readstring("z1.html"))
# Exploring with the various structure of Gumbo objects:
println(fieldnames(h.root))
println(fieldnames(h.root.children))
println(size(h.root.children))
# aviks code:
c = matchall(Selector("td:containsOwn(\"CAN:\") + td span"), h.root);
for x in c
println( x.children[1].text )
end
This particular webpage is more difficult to scrape than most, since it doesn't have a great CSS structure.
There's some nice documentation on workflow on the Cascadia README, but I still had some questions after reading it. For anyone else (like me, yesterday) who comes to this page looking for guidance on web scraping in Julia, I've created a jupyter notebook with a simple example that will hopefully help you understand the workflow in greater detail.

MySQL -> HTML Report, Styled like a Pivot Table

Ok, I'd like to start off by apologizing (profusely), since this seems to be a common question. Most of the examples seem to be somewhat similar, as well, but - for the life of me, I cannot wrap my brain around how to apply the myriad of quality responses to my specific table. And, I'm sure it's probably just the easiest thing in the world, what with all the very thorough responses/examples/links to resources with explanations/etc.
So, I suppose I'll just get right to it. The basics:
We host off-site copies of our clients' backups.
We need to know how much space they're using.
We are not at all consistent in Naming Convention, folder vs. disk per client, etc.
We need to automate a 'report', monthly, with data as follows:
-[C.Srv 01]---Size(GB)--Free(%)
Client 01 [Total] [AVG]
Server 01 109.43 25
Server 02 415.19 25
WHERE C.Srv = [Specified Cloud Server]
Clients Get a Total Size(GB) and an Average Free(%)
My MySQL table is this:
# Name DataType Length/Set Unsigned Allow NULL ZeroFill Default
1. ID INT 11 AUTO_INCREMENT
2. Client TEXT
3. Server TEXT
4. C.Srv TEXT
5. Size DECIMAL 10,2
6. Free DECIMAL 10,4
So, for Example, let's say I have this...
___ ________ ________ _________ _________ _______
ID | CLIENT | SERVER | C.SRV | SIZE | FREE
---|--------|--------|---------|---------|-------
1 | a | adc | cs_01 | 109.43 | 0.2504
2 | a | asql | cs_01 | 415.19 | 0.2504
3 | b | bdc | cs_01 | 583.91 | 0.1930
4 | b | bdev | cs_01 | 316.52 | 0.1930
5 | b | bsql | cs_01 | 1259.56 | 0.1930
6 | c | cdc | cs_01 | 355.30 | 0.7631
7 | d | ddc | cs_01 | 398.21 | 0.5808
Is it possible to get something pretty, in HTML (preferably), that has the basic structure of this...
_______ __________ ________
CS_01 | Size(GB) | Free(%)
-------|----------|--------
-a | 524.62 | 25.04%
-------|----------|--------
adc | 109.43 | 25.04%
asql | 415.19 | 25.04%
-b | 2178.88 | 19.30%
-------|----------|--------
bdc | 583.91 | 19.30%
bdev | 316.52 | 19.30%
bsql | 1259.56 | 19.30%
+c | 355.30 | 76.31%
-------|----------|--------
+d | 398.21 | 58.08%
_______|__________|________
Or, am I just S.O.L.? Format, I can mess with in CSS, or whatever (I hope), just so long as it's in that basic structure. (I don't know if it matters, but the final goal will be to collapse at the Client Level; in case that somehow factors into the approach/data-gathering.)