Mercurial repository internal format - mercurial

Docs regarding repository format over at Mercurial site are scattered all over and refer to various legacy versions as well as current one and all in all aren't very detailed IMO.
Is there any comprehensive and up-to-date description of Mercurial repository format?
One Year Later
HgSharp: 100% binary-compatible Mercurial Core implemented in C#.

The design of Mercurial is described in The Architecture of
Open Source Applications - Chapter 12. Mercurial.

The Design page at least references the relevant wiki pages.
It is indeed scattered, but it is listed in that page.
.--------linkrev-------------.
v |
.---------. .--------. .--------.
|changeset| .->|manifest| .->|file |---.
|index | | |index | | |index | |--.
`---------' | `--------' | `--------' | |
| | | | | `-------' |
V | V | V `-------'
.---------. | .--------. | .---------.
|changeset|-' |manifest|-' |file |
|data | |data | |revision |
`---------' `--------' `---------'

Related

Extracting and Constructing Tables from HTML Files using Julia

Here's a public link to an example html file. I would like to extract each set of CAN and yearly tax information (example highlighted in red in the image below) from the file and construct a dataframe that looks like the one below.
Target Fields
Example DataFrame
| Row | CAN | Crtf_NoCrtf | Tax_Year | Land_Value | Improv_Value | Total_Value | Total_Tax |
|-----+--------------+-------------+----------+------------+--------------+-------------+-----------|
| 1 | 184750010210 | Yes | 2016 | 16720 | 148330 | 165050 | 4432.24 |
| 2 | 184750010210 | Yes | 2015 | 16720 | 128250 | 144970 | 3901.06 |
| 3 | 184750010210 | Yes | 2014 | 16720 | 109740 | 126460 | 3412.63 |
| 4 | 184750010210 | Yes | 2013 | 16720 | 111430 | 128150 | 3474.46 |
| 5 | 184750010210 | Yes | 2012 | 16720 | 99340 | 116060 | 3146.17 |
| 6 | 184750010210 | Yes | 2011 | 16720 | 102350 | 119070 | 3218.80 |
| 7 | 184750010210 | Yes | 2010 | 16720 | 108440 | 125160 | 3369.97 |
| 8 | 184750010210 | Yes | 2009 | 16720 | 113870 | 130590 | 3458.14 |
| 9 | 184750010210 | Yes | 2008 | 16720 | 122390 | 139110 | 3629.85 |
| 10 | 184750010210 | Yes | 2007 | 16720 | 112820 | 129540 | 3302.72 |
| 11 | 184750010210 | Yes | 2006 | 12380 | 112760 | | 3623.12 |
| 12 | 184750010210 | Yes | 2005 | 19800 | 107400 | | 3882.24 |
Additional Information
If it is not possible to insert the CAN to each row that is okay, I can export the CAN numbers separately and find a way to attach them to the dataframe containing the tax values. I have looked into using beautiful soup for python, but I am an absolute novice with python and the rest of the scripts I am writing are in Julia, so I would prefer to keep everything in one language.
Is there any way to achieve what I am trying to achieve? I have looked at Gumbo.jl but can not find any detailed documentation/tutorials.
So Gumbo.jl will parse the HTML and give you a programatic representation of the structure of the HTML file (called a DOM - Document Object Model). This is typically a tree of html tags, which you can traverse and extract the data you need.
To make this easier, what you really want is a way to query the DOM, so that you can extract the data you need without having to traverse the entire tree yourself. The Cascadia.jl project does this for you. It is built on top of Gumbo, and uses CSS selectors as the query language.
So for your example, you could use something like the following to extract all the CAN fields:
julia> using Gumbo
julia> using Cascadia
julia> h=parsehtml(read("/Users/aviks/Download/z1.html", String))
julia> c = matchall(Selector("td:containsOwn(\"CAN:\") + td span"), h.root)
13-element Array{Gumbo.HTMLNode,1}:
Gumbo.HTMLElement{:span}:
<span class="value">184750010210</span>
...
#print all the CAN values
julia> for x in c
println( x.children[1].text )
end
184750010210
186170040070
175630130020
172640020290
168330020230
156340030160
118210000020
190490040500
173480080430
161160010050
153510060090
050493000250
050470630910
Hopefully this gives you an idea of how to extract all the data you need.
The current answer is a bit out of date since the readall() function no longer exists. I'll update his answer below.
Here's a general breakdown of the package ecosystem for Julia (as of the time of writing this answer):
Requests.jl is used to download the HTML file itself (note that in avik's answer, he reads the HTML file from his local machine)
Cascadia.jl is required to search for CSS tags (e.g. the tag that you would find if you were to use Selector Gadget).
Gumbo.jl is required to parse the resulting HTML
The key thing to remember is that Gumbo stores objects in tree format as HTMLNodes or HTMLElements. So most objects have "parents" and "children." To get the data you need, it's simply a matter of filtering with the right selector (using Cascadia) and then going to the correct point in the Gumbo tree.
An updated version of avik's answer:
using Requests, Cascadia, Gumbo
# r = get(url) # Normally, you'd put a url here, but I couldn't find a way to grab it without having to download it and read it locally
# h = parsehtml(String(r.data)) # Then normally you'd execute this
# Instead, I'm going to read in the html file as a string and give it to Gumbo
h = parsehtml(readstring("z1.html"))
# Exploring with the various structure of Gumbo objects:
println(fieldnames(h.root))
println(fieldnames(h.root.children))
println(size(h.root.children))
# aviks code:
c = matchall(Selector("td:containsOwn(\"CAN:\") + td span"), h.root);
for x in c
println( x.children[1].text )
end
This particular webpage is more difficult to scrape than most, since it doesn't have a great CSS structure.
There's some nice documentation on workflow on the Cascadia README, but I still had some questions after reading it. For anyone else (like me, yesterday) who comes to this page looking for guidance on web scraping in Julia, I've created a jupyter notebook with a simple example that will hopefully help you understand the workflow in greater detail.

MySQL -> HTML Report, Styled like a Pivot Table

Ok, I'd like to start off by apologizing (profusely), since this seems to be a common question. Most of the examples seem to be somewhat similar, as well, but - for the life of me, I cannot wrap my brain around how to apply the myriad of quality responses to my specific table. And, I'm sure it's probably just the easiest thing in the world, what with all the very thorough responses/examples/links to resources with explanations/etc.
So, I suppose I'll just get right to it. The basics:
We host off-site copies of our clients' backups.
We need to know how much space they're using.
We are not at all consistent in Naming Convention, folder vs. disk per client, etc.
We need to automate a 'report', monthly, with data as follows:
-[C.Srv 01]---Size(GB)--Free(%)
Client 01 [Total] [AVG]
Server 01 109.43 25
Server 02 415.19 25
WHERE C.Srv = [Specified Cloud Server]
Clients Get a Total Size(GB) and an Average Free(%)
My MySQL table is this:
# Name DataType Length/Set Unsigned Allow NULL ZeroFill Default
1. ID INT 11 AUTO_INCREMENT
2. Client TEXT
3. Server TEXT
4. C.Srv TEXT
5. Size DECIMAL 10,2
6. Free DECIMAL 10,4
So, for Example, let's say I have this...
___ ________ ________ _________ _________ _______
ID | CLIENT | SERVER | C.SRV | SIZE | FREE
---|--------|--------|---------|---------|-------
1 | a | adc | cs_01 | 109.43 | 0.2504
2 | a | asql | cs_01 | 415.19 | 0.2504
3 | b | bdc | cs_01 | 583.91 | 0.1930
4 | b | bdev | cs_01 | 316.52 | 0.1930
5 | b | bsql | cs_01 | 1259.56 | 0.1930
6 | c | cdc | cs_01 | 355.30 | 0.7631
7 | d | ddc | cs_01 | 398.21 | 0.5808
Is it possible to get something pretty, in HTML (preferably), that has the basic structure of this...
_______ __________ ________
CS_01 | Size(GB) | Free(%)
-------|----------|--------
-a | 524.62 | 25.04%
-------|----------|--------
adc | 109.43 | 25.04%
asql | 415.19 | 25.04%
-b | 2178.88 | 19.30%
-------|----------|--------
bdc | 583.91 | 19.30%
bdev | 316.52 | 19.30%
bsql | 1259.56 | 19.30%
+c | 355.30 | 76.31%
-------|----------|--------
+d | 398.21 | 58.08%
_______|__________|________
Or, am I just S.O.L.? Format, I can mess with in CSS, or whatever (I hope), just so long as it's in that basic structure. (I don't know if it matters, but the final goal will be to collapse at the Client Level; in case that somehow factors into the approach/data-gathering.)

Multiple Data Sources in Microsoft Excel SQL Query

I have a lot of spreadsheets that pull transactional information from our ERP software into Excel using the Microsoft Query that we then perform other calculations on automatically. Recently we upgraded our ERP system, but management made the decision to leave the transactional history in the old databases to have a clean one going forward in the new system. I still need to have some "rolling 12 months" graphs, but if I use only the old database, I'm missing new data and if I use only the new, I'm missing the last 11 months data.
Is there a way that I can write a query in Excel to pull data from the old database PartTran table and merge it with the new database PartTran table without user intervention each time? For instance, I don't want my users (if possible) to have to have two queries that they copy and paste into one Excel table. The schema of the tables (at least the columns I need) are identically named and defined.
If you want to take a bit of a fun, hacky Excel approach, you could do the "copy-paste" bit FOR your users behind the scenes. Given two similar tables OLD and NEW with structures
+-----+------+-------+------------+
| id | foo | bar | date |
+-----+------+-------+------------+
| 95 | blah | $25 | 2015-06-01 |
| 96 | bork | $12 | 2015-07-01 |
| 97 | bump | $200 | 2015-08-01 |
| 98 | fizz | | 2015-09-01 |
| 99 | buzz | $50 | 2015-10-01 |
| 100 | char | ($1) | 2015-11-01 |
| 101 | mope | | 2015-12-01 |
+-----+------+-------+------------+
and
+----+-----+-------+------------+------+---------+
| id | foo | bar | date | fizz | buzz |
+----+-----+-------+------------+------+---------+
| 1 | cat | ($10) | 2016-01-01 | 285B | 1110111 |
| 2 | dog | $25 | 2016-02-01 | 27F5 | 1110100 |
| 3 | ant | $100 | 2016-03-01 | 1F91 | 1001111 |
+----+-----+-------+------------+------+---------+
... you can union together the data for these two datasets with some prudent excel wizardry as below:
Your UNION table ( named using alt+j+t+a ) should have the following items:
New natural ID
DataSet pointer ( name of old or new table )
Derived ID from original dataset
Columns of data you want from Old & New DataSets
example:
+---------+------------+------------+----+------+-----+------------+------+------+
| UnionId | SourceName | SourceRank | id | foo | bar | date | fizz | buzz |
+---------+------------+------------+----+------+-----+------------+------+------+
| 1 | OLD | | | | | | | |
| 2 | NEW | | | | | | | |
+---------+------------+------------+----+------+-----+------------+------+------+
You will then make judicious use of Indirect() and VlookUp() to derive the lookup id and column targets. Sample code below
SourceRank - helper column
=COUNTIFS([SourceName],[#SourceName],[UnionId],"<="&[#UnionId])
id - the id from the original DataSet
=SMALL(INDIRECT([#SourceName]&"[id]"),[#SourceRank])
Everything else is just VlookUp madness!! Although I've taken the liberty of copying the sample code below for reference
foo =VLOOKUP([#id],INDIRECT([#SourceName]),MATCH(UNION[[#Headers],[foo]],INDIRECT([#SourceName]&"[#Headers]"),0),0)
bar =VLOOKUP([#id],INDIRECT([#SourceName]),MATCH(UNION[[#Headers],[bar]],INDIRECT([#SourceName]&"[#Headers]"),0),0)
date =VLOOKUP([#id],INDIRECT([#SourceName]),MATCH(UNION[[#Headers],[date]],INDIRECT([#SourceName]&"[#Headers]"),0),0)
fizz =VLOOKUP([#id],INDIRECT([#SourceName]),MATCH(UNION[[#Headers],[fizz]],INDIRECT([#SourceName]&"[#Headers]"),0),0)
buzz =VLOOKUP([#id],INDIRECT([#SourceName]),MATCH(UNION[[#Headers],[fizz]],INDIRECT([#SourceName]&"[#Headers]"),0),0)
Output
You'll likely want to make prudent use of If() and/or IfError() to help your users ignore the new column references to the old table and those rows that do not yet have data. Without that, however, you'll end up with something like the below.
This is both ready to accept & read new inputs to both OLD and NEW DataSets and is sortable to get rid of those pesky placeholder rows...
Hope this helps! Happy coding!

Search Replace in MySQL: remove directory structure but keep filename

I am changing directory structures in a Drupal installation and need to remove all path data except the file name itself.
So the basic structure is:
+-------------+--------------+---------+-----------+-------------+----------+-------+----------------------------------------------------------------------------------+-----------------------+
| entity_type | bundle | deleted | entity_id | revision_id | language | delta | field_filename_value | field_filename_format |
+-------------+--------------+---------+-----------+-------------+----------+-------+----------------------------------------------------------------------------------+-----------------------+
The filename is stored in field_filename_value. Here's a sample record:
+-------------+--------------+---------+-----------+-------------+----------+-------+----------------------------------------------------------------------------------+-----------------------+
| entity_type | bundle | deleted | entity_id | revision_id | language | delta | field_filename_value | field_filename_format |
+-------------+--------------+---------+-----------+-------------+----------+-------+----------------------------------------------------------------------------------+-----------------------+
| node | presentation | 0 | 11 | 11 | und | 0 | /really long path name/with lots of words/167 Clarence Ashley - Coo Coo Bird.mp3 | NULL |
+-------------+--------------+---------+-----------+-------------+----------+-------+----------------------------------------------------------------------------------+-----------------------+
That ridiculous filename value needs to be changed from:
/really long path name/with lots of words/167 Clarence Ashley - Coo Coo Bird.mp3
To this:
167 Clarence Ashley - Coo Coo Bird.mp3
Setting aside the bad practice of using spaces in file/directory names, how would you correct this? Is it possible using MySQL features alone?
As an added challenge, some files may be more than 2 directories deep.
Use substring_index
select substring_index('http://www.example.com/dev/archive/examples/test.htm','/',-1)
(both above are fully from
MySQL String Last Index Of
How you would use it is easy, but just to explain, you select the last index of the / and then do another substring function to cut off anything to the left of it

how to pass parameters from html page to batch file

I have the following requirement. I need to pass parameters from html page to batch file which in turn passes the paramter to xml file.I need to know how to pass parameters from html to batch file and from batch file to xml file
Thanks
What kind of "parameters"? What kind of "html page"? What kind of "batch file"? What kind of "xml file"?
Assuming that you mean that data from a HTML form should be processed by a batch file and written to disc as XML:
Data from HTML forms is always processed using the CGI protocol, and it's possible to do it with a batch script, probably even a Windows batch file.
However, this is going to be extremely uncomfortable, error-prone and insecure. It's much better to have a language or framework specifically geared towards web applications handle the low-level CGI stuff for you.
Common choices are: PHP, Perl, Java servlets or ASP.
While it's possible to write XML simply by outputting strings, you're virtually guaranteed to get malformed XML eventually.
It's much better to use a real XML framework to produce the XML - there are several to choose from for pretty much any language worth using.
m.mahesh.2000, it might be worth you drawing a little diagram of the various parts of the puzzle. HTML and XML files are not programs!
Consider these possible diagrams:
CGI Approach:
+--------------+ +----------------+
| Browser | | Web Server |
| | | (eg: Apache) |
| +----------+ | | +------------+ |
| |HTML | | --> | | CGI | |
| |Javascript| | | | | |
| +----------+ | | | +-------+ | |
+--------------+ | | | Perl | | |
| | +-------+ | |
| +------------+ |
+----------------+
Servlet Container Approach:
+--------------+ +------------------+
| Browser | | Tomcat |
| | | |
| +----------+ | | +-------------+ |
| |HTML | | --> | | Servlet | |
| |Javascript| | | | Container | |
| +----------+ | | | +---------+ | |
+--------------+ | | | Servlet | | |
| | +---------+ | |
| +-------------+ |
+------------------+
The browser renders your HTML, executes any javascript, and sends HTTP requests to your server - be this Apache, Tomcat, or other? Do you know what kind of server you have?
Apache spawns child CGI processes to act on certain HTTP requests. CGI processes are typically PHP or Perl scripts.
Tomcat has a number of threads to act on HTTP requests. Some requests are handled by Servlet instances hosted within a Servlet container.
Either the CGI process, or the servlet, will do the work of creating your XML file on the server, and contacting your database.
Hope this helps.
Are the batch file and xml file client or server side?
Either way you will need to add some script to the html file. Or even use server side scripting to generate the html...