hi.c
echo -e "#include<stdio.h>\nint main(){printf(\"Hi\\\n\");}">hi.c;gcc hi.c -o hi;./hi
echo -e "#include<stdio.h>\nint main(){printf(\"Hi\\\n\");}">hi.c;gcc hi.c -o hi;./hi
I seem to remember Git scaling poorly and years ago vowed never to contemplate a repository larger than 4 Gigs. Under the cobwebs of my memory I recall failure to pack and clean anything larger on a 32-bit machine. Well, anyway, I’ve just created a dozen repos from 20M-3G on the same machine (HP Pavilion, 4G RAM, git 1.7, kernel 2.6.32) and thought I’d record some stats.
Except for initialization (which is constant), the performance of adding, committing, and before/after repo size seems fairly linear.
Perhaps more data would smooth these curves out.
The initial size seems to have a purely geometric effect on the repo size. They are equal within a percent (after = before + repo = before * 2)
I’ll be making a few more repos shortly, cloning, and hope to report better stats after a few more commits.
Suppose you have multiple disks and you are concerned about loosing data. Well, it’s always a good idea to archive your data at a different location so that you can retrieve data from history even after a catastrophy such as fire or theft. But what about real-time protection against corruption and disk failure? We could replicate the disks (RAID 1 – write all data to two disks simultaneously) but can enjoy at best 50% efficiency (1 disk for the price of 2).
What about RAID 5? Somehow RAID 5 achieves better than 50% efficiency. With the minimum three disks we can achieve 66% efficiency, with ten disks we can expect up to 90% efficiency. A single disk can melt in acid but we won’t loose data. How can RAID 5 provide recovery from a total failure of a single disk without replicating the data?
Let’s use an example with five disks (80% efficiency). We’ll write a very tiny amount of data (three bits, ones and zeros) to four disks (A, B, C, D). The last disk (P) will be used for parity.
A B C D P
---------------
0 1 0 1
1 1 0 0
1 1 1 0
To get the parity, we need to XOR each bit across four disks. XOR (^ or “exclusive or”) is the same as {if a and b are the same, then 0. If different then 1}. So for the first row:
A^B = 0^1 = 1 (they are different). Then
(A^B)^C = 1^0 = 1 (A^B=1 is different from C=0). Finally
(A^B^C)^D = 1^1 = 0 (Parity = A^B^C^D = 0).
The second row:
A^B = 1^1 = 0
(A^B)^C = 0^0 = 0
(A^B^C)^D = 0^0 = 0 = P
The third and final row:
A^B = 1^1 = 0
(A^B)^C = 0^1 = 1
(A^B^C)^D = 1^0 = 1 = P
So our disks now look like this:
A B C D P
---------------
0 1 0 1 0
1 1 0 0 0
1 1 1 0 1
Now let’s suppose disk C disintegrates:
A B C D P
---------------
0 1 # 1 0
1 1 # 0 0
1 1 # 0 1
We can recreate C by the same XOR (^) functions, this time with P. Compressing the math a bit:
First row: C = ((A^B)^D)^P = ((0^1)^1)^0 = (1^1)^0 = 0^0 = 0
Second row: C = ((A^B)^D)^P = ((1^1)^0)^0 = (0^0)^0 = 0^0 = 0
Third row: C = ((A^B)^D)^P = ((1^1)^0)^1 = (0^0)^1 = 0^1 = 1
Voila.
<!doctype html><html lang=”en”><head><meta charset=”UTF-8″ /><link rel=”license”
href=”http://genaud.net/2011/poetic-license.txt” />
<title></title>
</head><body><time pubdate
datetime=”2011-02-01T10:00:00-03:00″></time>
Hello World
</body></html></html>
And the header for XHTML 5:
<?xml version=”1.0″ encoding=”utf-8″?>
<html xmlns=”http://www.w3.org/1999/xhtml”><head>