ClearCase Globally, Git Locally
Thanks
ClearCase deserves all the credit for encouraging me to learn and love Git. Branching and merging is as fast and painless as listing changesets between any arbitrary branch or point in time. Did I say fast? No longer bound by ClearCase’s dictations and laborious linear progression, one can work off line, rollback, and experiment in multiple branches, travel into the past, and explore limitless parallel dimensions. While inspired by some other solutions, I believe what follows is the cleanest imposition of a Git repository upon a ClearCase snapshot view.
Summary
The following setup, namely a combined snapshot cloned locally, allows Git to track a ClearCase snapshot view without external functions (such as rsync), minimizing hijacks, untracked files, and encourages somewhat standard workflows with both ClearCase UCM and Git, without putting any limits on what can be done with a Git repository. This recipe assumes that the reader is proficient with the Unix/Cygwin shell, Git, and ClearCase. In short, we will:
- Initialize a Git repository upon an existing pristine ClearCase snapshot view
- Clone the snapshot as a Git repository
- Track the master branch of the clone from the snapshot
- Perform all ClearCase rebases, updates, checkins, and deliveries only in the snapshot
- Work in branches of the Git clone
- Pull between the master branches of the snapshot and clone repositories
Getting started
It is easiest to start with a fresh ClearCase snapshot view (what we’ll call snapshot) of which we’ll modify the .git/info/exclude file (see below) to hide some of the ClearCase plumbing from Git. Then we add all the tracked content to a newly initialized git repository (which is one and the same as the ClearCase snapshot view), clone it, and remote track the clone’s master branch. Now the clone (by default) and snapshot are tracking each other:
$ cd snapshot $ git init $ cat >> .git/info/exclude .gitignore view.dat lost+found/ $ git add . $ git commit -m init $ git clone -o snapshot . ../clone $ git remote add -t master -m master clone ../clone
(The last step, ‘git remote add’, when last tested does not seem to work. It’s not necessary, but if we could get it to work, it could be cool.)
We will want to keep the snapshot pristine (see below). It should only be used as a staging area between upstream rebases and checkins, and downstream pulls. As far as possible, all development, branches, and merging should occur in the work clone. The snapshot could just as easily be used by multiple people, say a team working on a common project. In that case, I might recommend another bare clone. But here, we’ll assume all Git repositories (including the overlapping ClearCase snapshot) are locally used by one developer.
Git ignore
There are multiple ways to hide files from Git’s tracking view including .gitignore files scattered anywhere within the directory structure. This may very well be appropriate within the clone development branches to hide build artifacts, temporary files, etc. There, in the clone, you should consider whether you want to share the .gitignore files (before committing them) or add .gitignore to the root .gitignore file, thus ignoring itself.
In the snapshot, however, I’ve chosen to use the .git/info/exclude file instead because it is applied to the entire repository and is already hidden from Git’s tree. The snapshot has very different tracking requirements. We’ll want to filter out all of the Git and ClearCase plumbing such as view.dat, .vws, and any .gitignore files that try to come upstream from the clone. However, we generally want to be aware of everything that passes through our snapshot. To that end, we want to ignore very little and consider all untracked files. A file untracked by one system indicates a new addition in the other or a deletion the other doesn’t yet know about. No files should ever be untracked by both ClearCase and Git in the snapshot.
ClearCase update and rebase
I always run “Find modified files” from the ClearCase explorer before rebasing or delivering to ClearCase. Checkin, undo hijacks, etc, as appropriate to preserve a pristine snapshot. We’ll need to add (or remove) upstream changes after updating or rebasing the snapshot. Modified files can be easily added to the Git index on commit (with the -a flag).
# cd to snapshot # rebase or update from ClearCase upstream $ git status $ git add (/some/files) $ git rm (/old/files) # repeat above until ClearCase changeset is in the index (no untracked files) $ git commit -c "some comment"
Downstream development
Assuming we’ve been developing in one or many branches within the clone, eventually some changes are bound to emerge interesting and stable enough to share with others. We’ll use the master branch of clone to stage our merges before delivering a pretty package upstream. First, we’ll need to be in sync with the snapshot. If snapshot is not pristine, we need to get it to that state. If there are commits in snapshot not found in the clone’s master, we should pull (or rebase).
After clone’s master is equal to snapshot or contains a strict superset of changes, we can stage our changes in the master branch of clone. How we do that in the sidestream branches is completely up to you: rebase, pull, merge, squash, octopus, rebase -i. We commit a new feature into its own branch ready to be merged and pulled upstream. Our workflow might look something like this:
$ cd ../clone $ diff -r . ../snapshot (nothing) $ git checkout feature $ git rebase master # test, work, test $ git checkout master $ git pull feature $ git branch -d feature
Checkin to ClearCase
In the snapshot we’ll pull from clone’s master and deliver the changes upstream. We may have to manually add, remove, and checkout/in our changes to ClearCase. To help, we can tag before pulling and display the file names as a difference along with the status (new, deleted, modified).
$ cd ../snapshot $ git tag before $ git pull ../clone $ git diff --name-status before
(The ‘git remote add’ could have been handy here)
Automation
While the difference above could help us manually deliver upstream, the output of the last line above could very well be used in a script to deliver to ClearCase. Though I imagine procedures differ from environment to environment. I have not automated the ClearCase delivery myself, but here is a rough sketch:
$ git diff --name-status before > diff_before $ grep ^A diff_before | sed "s/^../cleartool mkelem /" $ grep ^M diff_before | sed "s/^../cleartool cc /" $ grep ^D diff_before | sed "s/^../cleartool rmname /" $ grep ^[ADM] diff_before | sed "s/^../cleartool ci /"
Pristine
Similarly, the pristine state could be checked with ‘git status’ and the ClearCase explorer. However, I’ve found the following commands helpful:
$ find . -type f -writable | grep -v lost+found | grep -v view.dat $ find . -type f -name *keep $ find . -type d -name *unloaded $ git diff --name-only HEAD $ git ls-files --others | grep -v .git | grep -v lost+found | grep -v view.dat $ cleartool ls -recurse -view -short | grep -v lost+found $ cleartool lsco -me -recurse -short $ cleartool ls -recurse | grep "\[hijacked\]"
Or a script which simplifies the above. This may evolve into a full git-clearcase tool if it proves useful:
$ pristine Checking writable... OK Checking artifacts... OK Checking Git status... OK Checking CC Untrack... OK Checking CC Checkouts...OK Checking CC Hijacks... OK
$ pristine --help usage: pristine [-[waguch]] [ dir... ] Flags: Check directories for... -w writable files (possibly hijacked) -a artifacts such as *.keep and *.unloaded -g Git status including untracked files -u ClearCase untracked files and directories -c ClearCase checked out files and directories -h ClearCase hijacked files and directories Note: The flags above are ordered considering speed and likelihood of failure (-w) to the slower operations (-ch). The ClearCase checks may be slower than other checks. -w is a reasonable substitute for -h although not technically the same (a file may be readonly and still hijacked)
Happy coding.
Comments
Hi,
Being quite fond of ClearCase, even though it needs a lot of setup adaption to work nice, I found this article interesting since distributed development interests me.
This concept works fine – if you do not care about history, traceability and similar concepts. I tried the suggested concept, and found out that git is not able to differ between (1) a rename and (2) deletion of a file/folder and addition of another. This “feature” – to my regret – makes use of the suggested procedure a nightmare for any Configuration Manager in ClearCase since it will sabotage the content in the ClearCase VOB.
Hi, I’m not sure I follow what you mean by lost history and traceability. Sure, if you commit a dozen times before checking in/delivering to CC, (or vise-versa) you loose the micro-history, but otherwise, I’m not sure I follow.
I agree, Git makes no distinction between moving and delete-and-recreating content. In both cases it is atomic. I find that “feature” (with or without quotes) excellent. How is that a problem?
Hi,
In ClearCase, every file and folder is en element. Everything that is done with that element is stored. If you move a file or folder, you can still – at any time – trace that move and see differences between branches of that element.
If you make a move, using the “git method” you suggest, you change the folder element twice. cleartool rmname removes an element from the folder and cleartool mkelem creates another element – of course with it’s own history, not in any way connected to its origin. This is a problem e.g. in bigger projects, where newcomers usually need to be able to see what others has done before and is doing in other branches.
What you probably need to do is to find out if something has been moved (git diff-index -M –name-status –cached) and then generate cleartool mv as well as checkout and merge any changes to an element.
If I get the time to write a script for that, I’ll send it over in some way.
Hi (same person?),
I see what you mean. Though, I think ‘my method’ is more sophisticated than the method above. Above, I only show the difference between any changeset in git vs. clearcase. It’s up to the local user to add, delete, move, etc. Git generally DOES KNOW that a file has been atomically moved (rather than independently deleted and recreated with a new name/location), so it should be possible to inform clearcase, albeit a practical nuisance. I have begun writing scripts locally which align git deletions with clearcase unloads, hijacks with modifications, etc, but its not robust enough yet to publish. I’ll give some thought to moves as well.
Cheers,
Alex
A python script for importing/exporting to Clearcase along similar lines:
http://github.com/charleso/git-cc/tree/master
Hope it helps.
Trying to find solution for different aspects of the same problem, maybe interesting for you
Thanks for this; I have been using it for a few months now, and it works pretty well for me.
However, I am not working on the new version of my project, and we created a new stream in clearcase for this version (I have no idea if this is the standard way of using clearcase; I am told that CC has no concept of trunk/branches).
I’m wondering if it is possible to make the new CC stream feed into a branch in the same git repository. I’m sure I can create a new repository, but I think it would be extremely useful to be able to switch branches in the same workspace, in case I need to switch back to the old branch to fix a bug or something.
I’m just not sure how to add the new view (in CC) as a new branch of an existing git repository. Any ideas?
Thanks!
I just saw this post about using ClearCase and Git together. Since you seem to know your way around this topic, I wondered if you would mind helping me with some research.
I am investigating the degree of potential interest among large enterprises in integrating Git with traditional software configuration management tools like ClearCase, AccuRev or Perforce.
Could I ask your opinion on the value of using ClearCase (or another SCM tool ) on top of Git? What would you say is the value-add of using both, instead of just Git? In terms of visibility, security, release engineering or other factors?
Any insights would be greatly appreciated.
Thanks very much,
Jon Friedman
jfriedmanlex@gmail.com / +1 781 328-2240
Hi Jon,
I have used RCS, CVS, SVN, VSS, Hg, Bzr, Git and others. I have never used AccuRev nor Perforce. Furthermore I have never used Git as corporate policy. I have not used ClearCase in the last two years. My experience with CC was not positive. It was extraordinarily slow, the work flow restrictive, and branching/merging was a mult-day painful exercise, even personal stream creation was an ordeal.
I use Git personally because it makes me more efficient. Other CC users would come to me to ask simple questions, such as ‘what files have changed between two dates/versions?’ which could be answered in seconds.
I do not know if Git would work in a corporate setting with a diverse user base. It is reputed to be difficult to understand (not that CC was intuitive). Though, considering the number of dedicated CC administrators, I believe their time would be better spent managing and helping DVCS users. But honestly, I do not know from experience. I understand Git seems to work with a massive number of distributed kernel developers.
> …large enterprises in integrating Git with
> traditional software configuration management
> tools like ClearCase, AccuRev or Perforce.
I would refuse a job if I was required to use ClearCase and denied supplemental tools such as Git.
Perhaps CC gave management some sense of control or predictability, but our required work flow with CC alone made me extraordinarily unproductive. Git allowed me to both conform to company policy while providing a useful personal work flow. I pulled from ClearCase periodically into my local Git repo and worked on numerous local branches. Whenever one branch was ready to commit, I would pull (CC rebase), test, then create a CC view, push, and commit the CC view. In other words my CC branches were opened, commit, and closed nearly atomically. Yes, I might have lost some data on my hard drive over a weekend, but I never did, and I rarely lost any sleep over it.
> value of using ClearCase … on top of Git?
I don’t mean to be overly semantic, but I believe what I have done is use Git on top of CC. I would certainly never use CC if Git were company policy. I saw NO (0%) personal value in CC (opaque corporate value, maybe). However, I felt compelled to use something like Git as long as CC was company policy, which while it added an extra few minutes on upstream commits, saved me several hours each week (day?) rebasing and merging.
> What would you say is the value-add of using both
> instead of just Git?
Git chokes on large repositories (5GB+). While one should divide projects into modules, if a project were quite massive, I could see the value-add of an upstream tool which scaled infinitely. However, that does not mean that the majority of VCS users should ever see that upstream tool, which I believe is a purely configuration management and deployment concern.
Git does not enforce a specific work-flow. On the other hand, repositories can easily be one directional, meaning that some person/party can be responsible for committing to ‘official’ repositories (even in a hierarchy), which I think is a better model than blind restrictions. CC might reduce major catastrophe, just as a person in a straight jacket is less likely to hurt themselves or others.
Alex:
Thanks very much for taking the time to make this detailed and thoughtful reply. It is very helpful.
If I could ask one follow-up question: For the situation you described (ClearCase or another tool upstream, developers using Git locally) – Do you think management or release engineering were better off than if Git alone were in use? From their perspective, would the advantages in managing and tracking the project outweigh the extra effort to support two tools? This assumes, as you say, that the majority of VCS users would only touch the upstream tool when they needed to check code out or back in.
Cheers, Jon
Given the exact same environment “management or release engineering were better off [with CC] than if Git alone were in use”. But…
Given two companies, both with competent developers, using different upstream VCS (Git and CC), each with competent administrators of the chosen VCS, then I believe the company using Git alone would have made the superior choice.
As for “ClearCase or another tool upstream”, I am not aware of another tool that would be appropriate in a dual environment. Of the products I am familiar with, performance/scalability and verifiability (cryptographic checksums) seem to be mutually exclusive. I do not believe requiring the use of two tools with the same objectives is good corporate policy. The “extra effort to support two tools” can not be justified. There should be one “official” repository and work-flow/policy, but the use of alternative tools should not be micromanaged nor even supported on the local machines.
Likewise, if the release team thought another tool were more appropriate for their specific needs, then they themselves could pull from various DVCS repos. In that sense, those from whom they are pulling do not need to know about their hypothetical release management tool. Regardless of the VCS used by the release team, local use of Git need not be a company decision. Companies should allow developers to use the tools that make them individually more productive.
If the developers find that their repositories are getting too large, their VCS no longer scales/performs, and it is inappropriate to break their projects into smaller modules, then I have little further advice for them. Perhaps they could consider Perforce.
Despite CC’s enormous (perhaps unnecessary) complexity, it does have a GUI that exposes a large subset of its features. Git I believe is best used on the command line; There are users who are afraid of the command line or believe its learning curve is too long. Without GUI tools, Git might be a hard sell for some enterprises.
In summary, I believe:
1) an enterprise should not enforce a two VCS policy
2) should not restrict the voluntary use of other tools as long as users conform to the company’s work flow (checkin) policy
3) Git is an appropriate choice for moderately sized repositories, whose users are not afraid of the command line
4) CC is rarely the optimal VCS choice
5) A scalable and verifiable DVCS with an intuitive GUI could be the optimal choice, if such a product existed
I use git on a daily basis on top of the Company’s Perforce repository.
Our main products repository is 3.2GiB in size, and 50,000 files.
Both Git and Mercurial have been performing extremely fast, with Git having the edge on large scale operations, and providing the better workflow (I use Mercurial’s MQ extension too).
Perforce on the other hand, has been finding it difficult to scale to more than a dozen simultaneous users over a global network (3 continents) with several proxies.
As I write this, our Perforce server is down. This is the third time in a year. It usually imposes a downtime of at least two days, with usually database corruption and partial recovery.
For the money it costs, Perforce makes no sense.
Hi Filippo,
That’s interesting to hear. I was/am under the (third hand) impression that Perforce’s single best selling point was/is scalability. In my personal experience 3.2 GiB is about as big as I’ll allow a Git repo to grow because of my inability to pack larger repos on my 32-bit machines.
Alex
Nice, I loved the Haiku in the license agreement!
Thanks very much,
Thanks for the information and for pristine. FWIW, I found I had to make the following very minor fixes to pristine version 0.9.3
diff -r af2d7074c7dd windows/bin/pristine
— a/windows/bin/pristine Thu Oct 13 13:17:47 2011 -0700
+++ b/windows/bin/pristine Thu Oct 13 14:14:33 2011 -0700
@@ -347,7 +347,7 @@
echo Log directory has been preserved: $DIR
echo
echo New untracked files:
- cat $DIR/git_untrack
+ cat $DIR/cc_untrack
echo
echo FAILURE: Snapshot is not pristine
exit 1
@@ -419,7 +419,7 @@
echo Log directory has been preserved: $DIR
echo
echo Hijacked tracked files:
- cat $DIR/git_changes
+ cat $DIR/cc_hijack
echo
echo FAILURE: Snapshot is not pristine
exit 1
Hi ACGarland,
Thanks for the patch. Unfortunately, I lost a large number of changes when I moved on from the company where we’d been using ClearCase. I haven’t touched ClearCase ever since. I trust your changes work well.
Thanks again,
Alex
There is a minor flaw with your pristine script: it indirectly uses the .gitignore or .git/info/exclude files as pattern suppliers for grep (the “-v -f file”) to ignore files. However, grep uses a regular expression syntax while git uses filename globbing patterns.
There is a “*.updt” line in my exclude file that tells git to ignore the logfiles created by the cleartool update command. However, the pristine script keeps reporting, e.g.,
Writable (hijacked) files:
update.2011-12-16T114852+0100.updt
because the globbing pattern “*.updt” is not parsed by grep; for the latter it should be “.*\.updt”.
So when compiling the $EXCLUDE file, which actually contains grep regular expressions, not filename globbing patterns, the stream ought to be filtered through sed with a substitution like “s/\./\\./g;s/\*/.*.
That should work pretty well in most cases, however, the “!” and “/” rules of gitignore are still ignored (no pun!) by grep.
Regards
Gan
Another issue:
When checking fo checkouts, don’t forget the “-cview” option unless you want all your views checked which might cause pristine to give false positives!
Regards
Gan
A small improvement I missed in the last patch: you can avoid the cat and grep invocations:
— pristine 2011-12-22 10:21:47.000000000 +0100
+++ /home/adiiacm1/bin/pristine 2011-12-21 17:08:29.612148300 +0100
@@ -63,7 +63,7 @@
# after appending gitignore to an exclude file
#
EXCLUDE=`pwd`/tmp.exclude
-echo .git/ > $EXCLUDE
+echo ‘\.git/’ > $EXCLUDE
PREV_DIR=`pwd`
while [ 1 -eq 1 ]; do
if [ `pwd` = "/" -o `pwd` = "\\" ]; then
@@ -72,11 +72,11 @@
cd $PREV_DIR
exit 1
elif [ -e .gitignore ]; then
- cat .gitignore | grep -v “^#” >> $EXCLUDE
+ sed ‘/^#/d;s/\./\\./g;s/\*/.*/g’ .gitignore >> $EXCLUDE
fi
if [ -d .git ]; then
if [ -e .git/info/exclude ]; then
- cat .git/info/exclude | grep -v “^#” >> $EXCLUDE
+ sed ‘/^#/d;s/\./\\./g;s/\*/.*/g’ .git/info/exclude >> $EXCLUDE
fi
DIR=`pwd`/.git/pristine
rm -rf $DIR
@@ -363,7 +363,7 @@
echo -n Checking \(-c\) CC Checkouts…
if [ $RUN_CC_CHECKOUTS = "TRUE" ]; then
- cleartool lsco -me -recurse -short $FILE_LIST $DIR_LIST >\
+ cleartool lsco -me -cview -recurse -short $FILE_LIST $DIR_LIST >\
$DIR/cc_checkout
if [ `cat $DIR/cc_checkout | wc -l` != "0" ]; then
Regards
Gan
Hello,
I made changes to your init script. See below.
Notes:
1) Now uses echo instead of cat, to reduce the need for interactive prompts and eliminates possibly entering needless ^M characters (e.g., when viewed in vi). This makes it slightly more platform-consistent.
2) Adds a blank .gitignore file to every empty ClearCase directory (without this, the above ‘diff’ command outputs a ton of garbage about missing directories in the clone repository).
If I have some free time, I might extend this further so that creating the initial ClearCase snapshot can be done from a single script. This would allow parameterizing the clone and snapshot directory names and automatically nest them under a parent directory. I will also look into ACGarland’s changes.
Also, thank you SO MUCH for this tutorial. A ClearCase snapshot update takes 93 seconds on my machine (benchmarked with PowerShell: “Measure-Command {cleartool update}”), and we also don’t use branching to work on features – only for release management. Being able to feature branch on my own machine will allow me to produce UI prototypes much faster. You are my hero.
cd snapshot
git init
find . -type d -empty -exec touch {}/.gitignore \;
echo .gitignore >> .git/info/exclude
echo view.dat >> .git/info/exclude
echo lost+found/ >> .git/info/exclude
git add .
git commit -m init
git clone -o snapshot . ../clone
git remote add -t master -m master clone ../clone
Hi, thanks for this article! How does clear case react to the .git folder in the snapshot directory? I obviously don’t want to commit .git into Clearcase. In the article you explain how to make git ignore the Clearcase metadata, but the other way round isn’t very clear to me. Thanks again.
Hi Felix, I haven’t used CC in three years, but I don’t recall the .git directory being a problem. If it’s not committed to CC, then it’s unlikely to mistakenly get absorbed. There must be some CC-ignore mechanism, but I don’t recall off hand, sorry.