Search
j0ke.net Open Build Service
>
Projects
>
devel
:
subversion
>
cvs2svn
> ready
Sign Up
|
Log In
Username
Password
Cancel
Overview
Repositories
Revisions
Requests
Users
Advanced
Attributes
Meta
File ready of Package cvs2svn
diff -purNbBwx .svn cvs2svn-1.5.x/BUGS cvs2svn-2.0.0/BUGS --- cvs2svn-1.5.x/BUGS 2006-05-17 19:55:25.000000000 +0200 +++ cvs2svn-2.0.0/BUGS 2007-08-15 22:53:54.000000000 +0200 @@ -9,31 +9,43 @@ for that, see http://cvs2svn.tigris.org/issue_tracker.html -Before reporting a bug, check to see +Before reporting a bug: - a) if you are already running the latest version of cvs2svn + a) Verify that you are running the latest version of cvs2svn. - b) if your bug is already filed in the issue tracker (see - http://tinyurl.com/2uxwv for a list of all open bugs). + b) Read the current frequently-asked-questions list at + http://cvs2svn.tigris.org/faq.html to see if your problem has a + known solution, and to help determine if your problem is caused + by corruption in your CVS repository. + + c) Check to see if your bug is already filed in the issue tracker + (see http://tinyurl.com/2uxwv for a list of all open bugs). Then, mail your bug report to dev@cvs2svn.tigris.org. To be useful, a bug report should include the following information: * The revision of cvs2svn you ran. Run 'cvs2svn --version' to - discover this. + determine this. - * The version of Subversion you used it with. + * The version of Subversion you used it with. Run 'svnadmin + --version' to determine this. * The exact cvs2svn command line you invoked, and the output it produced. + * The contents of the configuration file that you used (if you used + the --config option). + * The data you ran it on. If your CVS repository is small (only a few kilobytes), then just provide the repository itself. If it's large, or if the data is confidential, then please try to come up with some smaller, releasable data set that still stimulates the - bug. Often this is just a matter of invoking cvs2svn on deeper - subdirectories, until you find the minimal reproduction case. + bug. The cvs2svn project includes one script that can often help + you narrow down the source of the bug to just a few *,v files, + and another that helps strip proprietary information out of your + repository. See the FAQ (http://cvs2svn.tigris.org/faq.html) for + more information. The most important thing is that we be able to reproduce the bug :-). If we can reproduce it, we can usually fix it. If we can't reproduce diff -purNbBwx .svn cvs2svn-1.5.x/CHANGES cvs2svn-2.0.0/CHANGES --- cvs2svn-1.5.x/CHANGES 2007-01-28 22:46:15.000000000 +0100 +++ cvs2svn-2.0.0/CHANGES 2007-08-15 22:53:54.000000000 +0200 @@ -1,3 +1,50 @@ +Version 2.0.0 (15 August 2007) +------------------------------ + + New features: + * Add --use-internal-co to speed conversions, and make it the default. + * Add --retain-conflicting-attic-files option. + * Add --no-cross-branch-commits option. + * Add --default-eol option and deprecate --no-default-eol. + * RevisionRecorder hook allows file text/deltas to be recorded in pass 1. + * RevisionReader hook allow file text to be retrieved from RevisionRecorder. + * Slightly changed the order that properties are set, for more flexibility. + * Don't set svn:keywords on files for which svn:eol-style is not set. + * Implement issue #53: Allow --trunk='' for --trunk-only conversions. + + Bugs fixed: + * Fix issue #97: Follow symlinks within CVS repository. + * Fix issue #99: cvs2svn tries to create a file twice. + * Fix issue #100: cvs2svn doesn't retrieve the right version. + * Fix issue #105: Conflict between directory and Attic file causes crash. + * Fix issue #106: SVNRepositoryMirrorParentMissingError. + * Fix missing command-line handling of --fallback-encoding option. + * Fix issue #85: Disable symbol sanity checks with in --trunk-only mode. + + Improvements and output changes: + * Analyze CVS revision dependency graph, giving a more robust conversion. + * Improve choice of symbol parents when CVS history is ambiguous. + * In the case of clock skew to the past, resync forwards, not backwards. + * Treat timestamps that lie in the future as bogus, and adjust backwards. + * Gracefully handle tags that refer to nonexistent revisions. + * Check and fail if revision header appears multiple times. + * Gracefully handle multiple deltatext blocks for same revision. + * Be more careful about only processing reasonable *,v files. + * Improve checks for illegal filenames. + * Check if a directory name conflicts with a filename. + * When file is imported, omit the empty revision 1.1. + * If a non-trunk default branch is excluded, graft its contents to trunk. + * Omit the initial 'dead' revision when a file is added on a branch. + * Require --symbol-transform pattern to match entire symbol name. + * Treat files as binary by default instead of as text, because it is safer. + * Treat auto-props case-insensitively; deprecate --auto-props-ignore-case. + + Miscellaneous: + * Add a simple (nonportable) script to log cvs2svn memory usage. + * Allow contrib/shrink_test_case.py script to try deleting tags and branches. + * Add --skip-initial-test option to contrib/shrink_test_case.py script. + + Version 1.5.1 (28 January 2007) ------------------------------- diff -purNbBwx .svn cvs2svn-1.5.x/COPYING cvs2svn-2.0.0/COPYING --- cvs2svn-1.5.x/COPYING 2006-05-17 19:55:25.000000000 +0200 +++ cvs2svn-2.0.0/COPYING 2007-08-15 22:53:54.000000000 +0200 @@ -10,7 +10,7 @@ number incremented: .../license-2.html, on), you may use a newer version instead, at your option. ==================================================================== -Copyright (c) 2000-2006 CollabNet. All rights reserved. +Copyright (c) 2000-2007 CollabNet. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are diff -purNbBwx .svn cvs2svn-1.5.x/HACKING cvs2svn-2.0.0/HACKING --- cvs2svn-1.5.x/HACKING 2006-10-03 22:27:53.000000000 +0200 +++ cvs2svn-2.0.0/HACKING 2007-08-15 22:53:54.000000000 +0200 @@ -10,100 +10,13 @@ http://svn.collab.net/repos/svn/trunk/ww Also: - - Read design-notes.txt, you won't regret it. + - Read doc/design-notes.txt, you won't regret it. - Read the class documentation in cvs2svn. - Please put a new test in run-tests.py when you fix a bug. - Use 2 spaces between sentences in comments and docstrings. (This helps sentence-motion commands in some editors.) + - doc/making-releases.txt describes the procedure for making a new + release of cvs2svn. Happy hacking! - -Making releases -=============== - - Pre-release (repeat as appropriate): - A. Backport changes if appropriate. - B. Update CHANGES. - C. Run the testsuite, check everything is OK. - D. Trial-run ./dist.sh, check the output is sane. - - Notes for specific releases: - - Creating the release: - 1. If this is an A.B.0 release, make a branch: - svn copy http://cvs2svn.tigris.org/svn/cvs2svn/trunk \ - http://cvs2svn.tigris.org/svn/cvs2svn/branches/A.B.x - and then increment the -dev VERSION in cvs2svn on trunk. - 2. Set the release number and date in CHANGES on trunk. - 3. Switch to a branch working copy. - 4. Merge CHANGES to the release branch. - 5. Make a trial distribution and see that the unit tests run: - ./dist.sh - tar -xzf cvs2svn-A.B.C.tar.gz - cd cvs2svn-A.B.C - ./run-tests.py - cd .. - rm -rf cvs2svn-A.B.C - 6. Set VERSION in cvs2svn and then run: - svn copy . http://cvs2svn.tigris.org/svn/cvs2svn/tags/A.B.C - 7. Increment the -dev VERSION in cvs2svn on the A.B.x branch. - 8. Switch to the tag. - 9. Run: - ./dist.sh - 10. Create a detached signature for the tar file: - gpg --detach-sign -a cvs2svn-A.B.C.tar.gz - - Publishing the release: - 1. Upload tarball and signature to website download area. - 2. Move old releases into the 'Old' folder of the download area. - 3. Create a project announcement on the website. - 4. Send an announcement to announce@cvs2svn.tigris.org. - (users@cvs2svn.tigris.org is subscribed to announce, so there is - no need to send to both lists.) - 5. Update the topic on #cvs2svn. - - -Release announcement templates -============================== - -Here are suggested release announcement templates. Fill in the substitutions -as appropriate, and refer to previous announcements for examples. - -Web: -[[[ -cvs2svn VERSION is now released. -<br /> -The MD5 checksum is CHECKSUM -<br /> -For more information see <a -href="http://cvs2svn.tigris.org/source/browse/cvs2svn/tags/VERSION/CHANGES?view=markup" ->CHANGES</a>. -<br /> -Download: <a -href="http://cvs2svn.tigris.org/files/documents/1462/NNNNN/cvs2svn-VERSION.tar.gz" ->cvs2svn-VERSION.tar.gz</a>. -]]] - -Email: -[[[ -Subject: cvs2svn VERSION released -To: announce@cvs2svn.tigris.org -Reply-to: users@cvs2svn.tigris.org - -cvs2svn VERSION is now released. - -BRIEF_SUMMARY_OF_VERSION_HIGHLIGHTS - -For more information see: -http://cvs2svn.tigris.org/source/browse/cvs2svn/tags/VERSION/CHANGES?view=markup - -You can get it here: -http://cvs2svn.tigris.org/files/documents/1462/NNNNN/cvs2svn-VERSION.tar.gz - -The MD5 checksum is CHECKSUM. - -Please send any bug reports and comments to users@cvs2svn.tigris.org. - -YOUR_NAME, on behalf of the cvs2svn development team. -]]] diff -purNbBwx .svn cvs2svn-1.5.x/MANIFEST.in cvs2svn-2.0.0/MANIFEST.in --- cvs2svn-1.5.x/MANIFEST.in 2006-10-03 17:40:47.000000000 +0200 +++ cvs2svn-2.0.0/MANIFEST.in 2007-08-15 22:53:54.000000000 +0200 @@ -1,10 +1,11 @@ include *.py cvs2svn.1 dist.sh MANIFEST.in Makefile -include README BUGS COMMITTERS COPYING HACKING design-notes.txt CHANGES +include README BUGS COMMITTERS COPYING HACKING CHANGES include cvs2svn-example.options recursive-include svntest * recursive-include test-data * +recursive-include doc * recursive-include www * -recursive-include contrib *.py +recursive-include contrib *.py cvs2svn_memlog cvsVsvn.pl prune www/tigris-branding prune www/xhtml1-20020801 prune www/xhtml1.catalog diff -purNbBwx .svn cvs2svn-1.5.x/Makefile cvs2svn-2.0.0/Makefile --- cvs2svn-1.5.x/Makefile 2006-08-21 17:17:22.000000000 +0200 +++ cvs2svn-2.0.0/Makefile 2007-08-15 22:53:54.000000000 +0200 @@ -26,9 +26,12 @@ install: check: ${PYTHON} ./run-tests.py +pycheck: + pychecker cvs2svn_lib/*.py + clean: - rm -rf cvs2svn-*.tar.gz build tmp - for d in . cvs2svn_lib cvs2svn_rcsparse svntest ; \ + -rm -rf cvs2svn-*.tar.gz build cvs2svn-tmp + -for d in . cvs2svn_lib cvs2svn_rcsparse svntest ; \ do \ rm -f $$d/*.pyc $$d/*.pyo; \ done diff -purNbBwx .svn cvs2svn-1.5.x/PKG-INFO cvs2svn-2.0.0/PKG-INFO --- cvs2svn-1.5.x/PKG-INFO 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/PKG-INFO 2007-08-15 22:54:49.000000000 +0200 @@ -0,0 +1,10 @@ +Metadata-Version: 1.0 +Name: cvs2svn +Version: 2.0.0 +Summary: CVS-to-Subversion repository converter +Home-page: http://cvs2svn.tigris.org/ +Author: The cvs2svn Team +Author-email: <dev@cvs2svn.tigris.org> +License: Apache-style +Description: UNKNOWN +Platform: UNKNOWN diff -purNbBwx .svn cvs2svn-1.5.x/contrib/.dired cvs2svn-2.0.0/contrib/.dired --- cvs2svn-1.5.x/contrib/.dired 2006-09-08 01:03:40.000000000 +0200 +++ cvs2svn-2.0.0/contrib/.dired 1970-01-01 01:00:00.000000000 +0100 @@ -1,4 +0,0 @@ -Local Variables: -dired-omit-files-p: t -dired-omit-extensions: (".pyc" ".pyo") -End: diff -purNbBwx .svn cvs2svn-1.5.x/contrib/__init__.py cvs2svn-2.0.0/contrib/__init__.py --- cvs2svn-1.5.x/contrib/__init__.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/contrib/__init__.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,18 @@ +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""Allow this directory to be imported as a module.""" + diff -purNbBwx .svn cvs2svn-1.5.x/contrib/cvs2svn_memlog cvs2svn-2.0.0/contrib/cvs2svn_memlog --- cvs2svn-1.5.x/contrib/cvs2svn_memlog 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/contrib/cvs2svn_memlog 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,122 @@ +#!/usr/bin/env python +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2000-2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""Run cvs2svn, but logging memory usage. + +Memory use is logged every MemoryLogger.interval seconds. This script +takes the same parameters as cvs2svn. + +Memory use is determined by reading from the /proc filesystem. This +method is not very portable, but should hopefully work on a typical +modern Linux.""" + + +import sys +import os + +# Make sure this Python is recent enough. Do this as early as possible, +# using only code compatible with Python 1.5.2 before the check. +if sys.hexversion < 0x02020000: + sys.stderr.write("ERROR: Python 2.2 or higher required.\n") + sys.exit(1) + +sys.path.insert(0, os.path.dirname(os.path.dirname(sys.argv[0]))) + +import re +import time +import getopt +import threading + +from cvs2svn_lib.boolean import * +from cvs2svn_lib.common import FatalException +from cvs2svn_lib.log import Log +from cvs2svn_lib.main import main + + +usage_string = """\ +USAGE: %(progname)s [--interval=<value>] [--help|-h] -- <cvs2svn-args> + ('--' is required, to separate %(progname)s options from the options + and arguments that will be passed through to cvs2svn.) + + --interval=VALUE Specify the time in seconds between memory logs. + --help, -h Print this usage message. +""" + + +def usage(f=sys.stderr): + f.write(usage_string % {'progname' : sys.argv[0]}) + + +rss_re = re.compile(r'^VmRSS\:\s+(?P<mem>.*)$') + +def get_memory_used(): + filename = '/proc/%d/status' % (os.getpid(),) + for l in open(filename).readlines(): + l = l.strip() + m = rss_re.match(l) + if m: + return m.group('mem') + + return 'Unknown' + + +class MemoryLogger(threading.Thread): + def __init__(self, interval): + threading.Thread.__init__(self) + self.setDaemon(True) + self.start_time = time.time() + self.interval = interval + + def run(self): + i = 0 + while True: + delay = self.start_time + self.interval * i - time.time() + if delay > 0: + time.sleep(delay) + Log().write('Memory used: %s' % (get_memory_used(),)) + i += 1 + + +try: + opts, args = getopt.getopt(sys.argv[1:], 'h', [ + 'interval=', + 'help', + ]) +except getopt.GetoptError, e: + sys.stderr.write('Unknown option: %s\n' % (e,)) + usage() + sys.exit(1) + +interval = 1.0 + +for opt, value in opts: + if opt == '--interval': + interval = float(value) + elif opt in ['-h', '--help']: + usage(sys.stdout) + sys.exit(0) + + +MemoryLogger(interval=interval).start() + +try: + main(sys.argv[0], args) +except FatalException, e: + sys.stderr.write(str(e)) + sys.exit(1) + + diff -purNbBwx .svn cvs2svn-1.5.x/contrib/cvsVsvn.pl cvs2svn-2.0.0/contrib/cvsVsvn.pl --- cvs2svn-1.5.x/contrib/cvsVsvn.pl 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/contrib/cvsVsvn.pl 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,194 @@ +#!/usr/bin/perl -w + +# +# (C) 2005 The Measurement Factory http://www.measurement-factory.com/ +# This software is distributed under Apache License, version 2.0. +# + +# The cvsVsvn.pl script compares CVS and Subversion projects. The +# user is notified if project files maintained by CVS differ from +# those maintained by Subversion: +# +# $ CVSROOT=/my/cvsrepo ./cvsVsvn project1 svn+ssh://host/path/projects/p1 +# collecting CVS tags... +# found 34 CVS tags +# comparing tagged snapshots... +# HEAD snapshots appear to be the same +# ... +# CVS and SVN repositories differ because cvs_foo and svn_tags_foo +# export directories differ in cvsVsvn.tmp +# +# The comparison is done for every CVS tag and branch (including +# HEAD), by exporting corresponding CVS and Subversion snapshots and +# running 'diff' against the two resulting directories. One can edit +# the script or use the environment variable DIFF_OPTIONS to alter +# 'diff' behavior (e.g., ignore differences in some files). + +# Commit logs are not compared, unfortunately. This script is also +# confused by files that differ due to different keyword expansion by +# CVS and SVN. + +use strict; + +# cvsVsvn exports a user-specified module from CVS and Subversion +# repositories and compares the two exported directories using the +# 'diff' tool. The procedure is performed for all CVS tags (including +# HEAD and branches). + +die(&usage()) unless @ARGV == 2; +my ($CvsModule, $SvnModule) = @ARGV; + +my $TmpDir = 'cvsVsvn.tmp'; # directory to store temporary files + +my @Tags = &collectTags(); + +print(STDERR "comparing tagged snapshots...\n"); +foreach my $tagPair (@Tags) { + &compareTags($tagPair->{cvs}, $tagPair->{svn}); +} + +print(STDERR "CVS and Subversion repositories appear to be the same\n"); +exit(0); + +sub collectTags { + print(STDERR "collecting CVS tags...\n"); + + my @tags = ( + { + cvs => 'HEAD', + svn => 'trunk' + } + ); + + # get CVS log headers with symbolic tags + my %names = (); + my $inNames; + my $cmd = sprintf('cvs rlog -h %s', $CvsModule); + open(IF, "$cmd|") or die("cannot execute $cmd: $!, stopped"); + while (<IF>) { + if ($inNames) { + my ($name, $version) = /\s+(\S+):\s*(\d\S*)/; + if ($inNames = defined $version) { + my @nums = split(/\./, $version); + my $isBranch = + (2*int(@nums/2) != @nums) || + (@nums > 2 && $nums[$#nums-1] == 0); + my $status = $isBranch ? 'branches' : 'tags'; + my $oldStatus = $names{$name}; + next if $oldStatus && $oldStatus eq $status; + die("change in $name tag status, stopped") if $oldStatus; + $names{$name} = $status; + } + } else { + $inNames = /^symbolic names:/; + } + } + close(IF); + + while (my ($name, $status) = each %names) { + my $tagPair = { + cvs => $name, + svn => sprintf('%s/%s', $status, $name) + }; + push (@tags, $tagPair); + } + + printf(STDERR "found %d CVS tags\n", scalar @tags); + return @tags; +} + +sub compareTags { + my ($cvsTag, $svnTag) = @_; + + &prepDirs(); + + &cvsExport($cvsTag); + &svnExport($svnTag); + + &diffDir($cvsTag, $svnTag); + + # identical directories, clean up + &cleanDirs(); +} + +sub diffDir { + my ($cvsTag, $svnTag) = @_; + my $cvsDir = &cvsDir($cvsTag); + my $svnDir = &svnDir($svnTag); + + my $same = systemf('diff --brief -b -B -r "%s" "%s"', + $cvsDir, $svnDir) == 0; + die("CVS and SVN repositories differ because ". + "$cvsDir and $svnDir export directories differ in $TmpDir; stopped") + unless $same; + + print(STDERR "$cvsTag snapshots appear to be the same\n"); + return 0; +} + +sub makeDir { + my $dir = shift; + &systemf('mkdir %s', $dir) == 0 or die("cannot create $dir: $!, stopped"); +} + +sub prepDirs { + &makeDir($TmpDir); + chdir($TmpDir) or die($!); +} + +sub cleanDirs { + chdir('..') or die($!); + &systemf('rm -irf %s', $TmpDir) == 0 + or die("cannot delete $TmpDir: $!, stopped"); +} + +sub cvsExport { + my ($cvsTag) = @_; + + my $dir = &cvsDir($cvsTag); + &makeDir($dir); + &systemf('cvs -Q export -r %s -d %s %s', $cvsTag, $dir, $CvsModule) == 0 + or die("cannot export $cvsTag of CVS module '$CvsModule', stopped"); +} + +sub svnExport { + my ($svnTag) = @_; + + my $dir = &svnDir($svnTag); + my $cvsOk = + &systemf('svn list %s/%s > /dev/null', $SvnModule, $svnTag) == 0 + && &systemf('svn -q export %s/%s %s', $SvnModule, $svnTag, $dir) == 0; + die("cannot export $svnTag of svn module '$SvnModule', stopped") + unless $cvsOk && -d $dir; +} + +sub tag2dir { + my ($category, $tag) = @_; + + my $dir = sprintf('%s_%s', $category, $tag); + # remove dangerous chars + $dir =~ s/[^A-z0-9_\.\-]+/_/g; + return $dir; +} + +sub cvsDir { + return &tag2dir('cvs', @_); +} + +sub svnDir { + return &tag2dir('svn', @_); +} + +sub systemf { + my ($fmt, @params) = @_; + + my $cmd = sprintf($fmt, (@params)); + #print(STDERR "$cmd\n"); + return system($cmd); +} + +sub usage { + return "usage: $0 <CVS module name> <Subversion URL>\n"; +} + + diff -purNbBwx .svn cvs2svn-1.5.x/contrib/show-db.py cvs2svn-2.0.0/contrib/show-db.py --- cvs2svn-1.5.x/contrib/show-db.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/contrib/show-db.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,209 @@ +#!/usr/bin/env python + +import anydbm +import marshal +import sys +import os +import getopt +import cPickle as pickle +from cStringIO import StringIO + +sys.path.insert(0, os.path.dirname(os.path.dirname(sys.argv[0]))) + +from cvs2svn_lib import config +from cvs2svn_lib.context import Ctx +from cvs2svn_lib.common import DB_OPEN_READ +from cvs2svn_lib.artifact_manager import artifact_manager + + +def usage(): + cmd = sys.argv[0] + sys.stderr.write('Usage: %s OPTION [DIRECTORY]\n\n' % os.path.basename(cmd)) + sys.stderr.write( + 'Show the contents of the temporary database files created by cvs2svn\n' + 'in a structured human-readable way.\n' + '\n' + 'OPTION is one of:\n' + ' -R SVNRepositoryMirror revisions table\n' + ' -N SVNRepositoryMirror nodes table\n' + ' -r rev SVNRepositoryMirror node tree for specific revision\n' + ' -m MetadataDatabase\n' + ' -f CVSFileDatabase\n' + ' -c PersistenceManager SVNCommit table\n' + ' -C PersistenceManager cvs-revs-to-svn-revnums table\n' + ' -i CVSItemDatabase (normal)\n' + ' -I CVSItemDatabase (filtered)\n' + ' -p file Show the given file, assuming it contains a pickle.\n' + '\n' + 'DIRECTORY is the directory containing the temporary database files.\n' + 'If omitted, the current directory is assumed.\n') + sys.exit(1) + + +def print_node_tree(db, key="0", name="<rootnode>", prefix=""): + print "%s%s (%s)" % (prefix, name, key) + if name[:1] != "/": + dict = marshal.loads(db[key]) + items = dict.items() + items.sort() + for entry in items: + print_node_tree(db, entry[1], entry[0], prefix + " ") + + +def show_int2str_db(fname): + db = anydbm.open(fname, 'r') + k = map(int, db.keys()) + k.sort() + for i in k: + print "%6d: %s" % (i, db[str(i)]) + +def show_str2marshal_db(fname): + db = anydbm.open(fname, 'r') + k = db.keys() + k.sort() + for i in k: + print "%6s: %s" % (i, marshal.loads(db[i])) + +def show_str2pickle_db(fname): + db = anydbm.open(fname, 'r') + k = db.keys() + k.sort() + for i in k: + o = pickle.loads(db[i]) + print "%6s: %r" % (i, o) + print " %s" % (o,) + +def show_str2ppickle_db(fname): + db = anydbm.open(fname, 'r') + k = db.keys() + k.remove('_') + k.sort(key=lambda s: int(s, 16)) + u1 = pickle.Unpickler(StringIO(db['_'])) + u1.load() + for i in k: + u2 = pickle.Unpickler(StringIO(db[i])) + u2.memo = u1.memo.copy() + o = u2.load() + print "%6s: %r" % (i, o) + print " %s" % (o,) + +def show_cvsitemstore(): + for cvs_file_items in Ctx()._cvs_items_db.iter_cvs_file_items(): + items = cvs_file_items.values() + items.sort(key=lambda i: i.id) + for item in items: + print "%6x: %r" % (item.id, item,) + + +def show_filtered_cvs_item_store(): + from cvs2svn_lib.cvs_item_database import IndexedCVSItemStore + db = IndexedCVSItemStore( + artifact_manager.get_temp_file(config.CVS_ITEMS_FILTERED_STORE), + artifact_manager.get_temp_file(config.CVS_ITEMS_FILTERED_INDEX_TABLE), + DB_OPEN_READ) + + ids = list(db.iterkeys()) + ids.sort() + for id in ids: + cvs_item = db[id] + print "%6x: %r" % (cvs_item.id, cvs_item,) + + + +class ProjectList: + """A mock project-list that can be assigned to Ctx().projects.""" + + def __init__(self): + self.projects = {} + + def __getitem__(self, i): + return self.projects.setdefault(i, 'Project%d' % i) + + +def prime_ctx(): + am = artifact_manager + + def rf(filename): + am.register_temp_file(filename, None) + + from cvs2svn_lib.common import DB_OPEN_READ + from cvs2svn_lib.symbol_database import SymbolDatabase + from cvs2svn_lib.cvs_file_database import CVSFileDatabase + rf(config.CVS_FILES_DB) + rf(config.SYMBOL_DB) + from cvs2svn_lib.cvs_item_database import OldCVSItemStore + from cvs2svn_lib.metadata_database import MetadataDatabase + rf(config.METADATA_DB) + rf(config.CVS_ITEMS_STORE) + rf(config.CVS_ITEMS_FILTERED_STORE) + rf(config.CVS_ITEMS_FILTERED_INDEX_TABLE) + artifact_manager.pass_started(None) + + Ctx().projects = ProjectList() + Ctx()._symbol_db = SymbolDatabase() + Ctx()._cvs_file_db = CVSFileDatabase(DB_OPEN_READ) + Ctx()._cvs_items_db = OldCVSItemStore( + am.get_temp_file(config.CVS_ITEMS_STORE) + ) + Ctx()._metadata_db = MetadataDatabase(DB_OPEN_READ) + +def main(): + am = artifact_manager + try: + opts, args = getopt.getopt(sys.argv[1:], "RNr:mlfcCiIp:") + except getopt.GetoptError: + usage() + + if len(args) > 1 or len(opts) != 1: + usage() + + if len(args) == 1: + Ctx().tmpdir = args[0] + + for o, a in opts: + if o == "-R": + show_int2str_db(config.SVN_MIRROR_REVISIONS_TABLE) + elif o == "-N": + show_str2marshal_db( + config.SVN_MIRROR_NODES_STORE, + config.SVN_MIRROR_NODES_INDEX_TABLE + ) + elif o == "-r": + try: + revnum = int(a) + except ValueError: + sys.stderr.write('Option -r requires a valid revision number\n') + sys.exit(1) + db = anydbm.open(config.SVN_MIRROR_REVISIONS_TABLE, 'r') + key = db[str(revnum)] + db.close() + db = anydbm.open(config.SVN_MIRROR_NODES_STORE, 'r') + print_node_tree(db, key, "Revision %d" % revnum) + elif o == "-m": + show_str2marshal_db(config.METADATA_DB) + elif o == "-f": + show_str2pickle_db(config.CVS_FILES_DB) + elif o == "-c": + prime_ctx() + show_str2ppickle_db( + config.SVN_COMMITS_INDEX_TABLE, config.SVN_COMMITS_STORE + ) + elif o == "-C": + show_str2marshal_db(config.CVS_REVS_TO_SVN_REVNUMS) + elif o == "-i": + prime_ctx() + show_cvsitemstore() + elif o == "-I": + prime_ctx() + show_filtered_cvs_item_store() + elif o == "-p": + obj = pickle.load(open(a)) + print repr(obj) + print obj + else: + usage() + sys.exit(2) + + +if __name__ == '__main__': + main() diff -purNbBwx .svn cvs2svn-1.5.x/contrib/shrink_test_case.py cvs2svn-2.0.0/contrib/shrink_test_case.py --- cvs2svn-1.5.x/contrib/shrink_test_case.py 2006-09-21 10:47:56.000000000 +0200 +++ cvs2svn-2.0.0/contrib/shrink_test_case.py 2007-08-15 22:53:53.000000000 +0200 @@ -3,7 +3,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2006 CollabNet. All rights reserved. +# Copyright (c) 2006-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -37,17 +37,50 @@ if the bug is still present, and fail if import sys import os import shutil +import getopt +from cStringIO import StringIO sys.path.insert(0, os.path.dirname(os.path.dirname(sys.argv[0]))) from cvs2svn_lib.key_generator import KeyGenerator +import cvs2svn_rcsparse + + +from contrib.rcs_file_filter import WriteRCSFileSink +from contrib.rcs_file_filter import FilterSink + + +usage_string = """\ +USAGE: %(progname)s [OPT...] CVSREPO TEST_COMMAND + + CVSREPO is the path to a CVS repository. + + ***THE REPOSITORY WILL BE DESTROYED*** + + TEST_COMMAND is a command that runs successfully (i.e., with exit + code '0') if the bug is still present, and fails if the bug is + absent. + +Valid options: + --skip-initial-test Assume that the bug is present when run on the initial + repository. Usually this fact is verified + automatically. + --help, -h Print this usage message. +""" + + verbose = 1 tmpdir = 'shrink_test_case-tmp' file_key_generator = KeyGenerator(1) + +def usage(f=sys.stderr): + f.write(usage_string % {'progname' : sys.argv[0]}) + + def get_tmp_filename(): return os.path.join(tmpdir, 'f%07d.tmp' % file_key_generator.gen_id()) @@ -72,6 +105,15 @@ def command(cmd, *args): class Modification: """A reversible modification that can be made to the repository.""" + def get_size(self): + """Return the estimated size of this modification. + + This should be approximately the number of bytes by which the + problem will be shrunk if this modification is successful. It + is used to choose the order to attempt the modifications.""" + + raise NotImplementedError() + def modify(self): """Modify the repository. @@ -89,31 +131,52 @@ class Modification: raise NotImplementedError() - def try_mod(self): + def try_mod(self, test_command): + if verbose >= 1: + sys.stdout.write('Testing with the following modifications:\n') + self.output(sys.stdout, ' ') self.modify() try: - command(*test_command) + test_command() except CommandFailedException: if verbose >= 1: sys.stdout.write( - 'The bug disappeared after the following modifications ' - '(which were reverted):\n' + 'The bug disappeared. Reverting modifications.\n' ) - self.output(sys.stdout, ' ') else: - sys.stdout.write( - 'Attempted modification unsuccessful.\n' - ) + sys.stdout.write('Attempted modification unsuccessful.\n') self.revert() return False + except KeyboardInterrupt: + sys.stderr.write('Interrupted. Reverting last modifications.\n') + self.revert() + raise + except Exception: + sys.stderr.write( + 'Unexpected exception. Reverting last modifications.\n' + ) + self.revert() + raise else: self.commit() + if verbose >= 1: + sys.stdout.write('The bug remains. Keeping modifications.\n') + else: sys.stdout.write( 'The bug remains after the following modifications:\n' ) self.output(sys.stdout, ' ') return True + def get_submodifications(self, success): + """Return a generator or iterable of submodifications. + + Return submodifications that should be tried after this this + modification. SUCCESS specifies whether this modification was + successful.""" + + return [] + def output(self, f, prefix=''): raise NotImplementedError() @@ -121,9 +184,63 @@ class Modification: return str(self) +class EmptyModificationListException(Exception): + pass + + +class SplitModification(Modification): + """Holds two modifications split out of a failing modification. + + Because the original modification failed, it known that mod1+mod2 + can't succeed. So if mod1 succeeds, mod2 need not be attempted + (though its submodifications are attempted).""" + + def __init__(self, mod1, mod2): + # Choose mod1 to be the larger modification: + if mod2.get_size() > mod1.get_size(): + mod1, mod2 = mod2, mod1 + + self.mod1 = mod1 + self.mod2 = mod2 + + def get_size(self): + return self.mod1.get_size() + + def modify(self): + self.mod1.modify() + + def revert(self): + self.mod1.revert() + + def commit(self): + self.mod1.commit() + + def get_submodifications(self, success): + if success: + for mod in self.mod2.get_submodifications(False): + yield mod + else: + yield self.mod2 + + for mod in self.mod1.get_submodifications(success): + yield mod + + def output(self, f, prefix=''): + self.mod1.output(f, prefix=prefix) + + def __str__(self): + return 'SplitModification(%s, %s)' % (self.mod1, self.mod2,) + + class CompoundModification(Modification): def __init__(self, modifications): + if not modifications: + raise EmptyModificationListException() self.modifications = modifications + self.size = sum(mod.get_size() for mod in self.modifications) + + def get_size(self): + return self.size def modify(self): for modification in self.modifications: @@ -137,6 +254,25 @@ class CompoundModification(Modification) for modification in self.modifications: modification.commit() + def get_submodifications(self, success): + if success: + # All modifications were completed successfully; no need + # to try subsets: + pass + elif len(self.modifications) == 1: + # Our modification list cannot be subdivided, but maybe + # the remaining modification can: + for mod in self.modifications[0].get_submodifications(False): + yield mod + else: + # Create subsets of each half of the list and put them in + # a SplitModification: + n = len(self.modifications) // 2 + yield SplitModification( + create_modification(self.modifications[:n]), + create_modification(self.modifications[n:]) + ) + def output(self, f, prefix=''): for modification in self.modifications: modification.output(f, prefix=prefix) @@ -145,9 +281,38 @@ class CompoundModification(Modification) return str(self.modifications) +def create_modification(mods): + """Create and return a Modification based on the iterable MODS. + + Raise EmptyModificationListException if mods is empty.""" + + mods = list(mods) + if len(mods) == 1: + return mods[0] + else: + return CompoundModification(mods) + + +def compute_dir_size(path): + # Add a little bit for the directory itself. + size = 100L + for filename in os.listdir(path): + subpath = os.path.join(path, filename) + if os.path.isdir(subpath): + size += compute_dir_size(subpath) + elif os.path.isfile(subpath): + size += os.path.getsize(subpath) + + return size + + class DeleteDirectoryModification(Modification): def __init__(self, path): self.path = path + self.size = compute_dir_size(self.path) + + def get_size(self): + return self.size def modify(self): self.tempfile = get_tmp_filename() @@ -161,6 +326,27 @@ class DeleteDirectoryModification(Modifi shutil.rmtree(self.tempfile) self.tempfile = None + def get_submodifications(self, success): + if success: + # The whole directory could be deleted; no need to recurse: + pass + else: + # Try deleting subdirectories: + mods = [ + DeleteDirectoryModification(subdir) + for subdir in get_dirs(self.path) + ] + if mods: + yield create_modification(mods) + + # Try deleting files: + mods = [ + DeleteFileModification(filename) + for filename in get_files(self.path) + ] + if mods: + yield create_modification(mods) + def output(self, f, prefix=''): f.write('%sDeleted directory %r\n' % (prefix, self.path,)) @@ -171,6 +357,10 @@ class DeleteDirectoryModification(Modifi class DeleteFileModification(Modification): def __init__(self, path): self.path = path + self.size = os.path.getsize(self.path) + + def get_size(self): + return self.size def modify(self): self.tempfile = get_tmp_filename() @@ -191,96 +381,380 @@ class DeleteFileModification(Modificatio return 'DeleteFile(%r)' % self.path -def try_modification_combinations(mods): - """Try to do as many modifications from the list as possible. +def rev_tuple(revision): + retval = [int(s) for s in revision.split('.') if int(s)] + if retval[-2] == 0: + del retval[-2] + return tuple(retval) - Return True if any modifications were successful.""" - # A list of lists of modifications that should still be tried: - todo = [mods] +class RCSFileFilter: + def get_size(self): + raise NotImplementedError() - retval = False + def get_filter_sink(self, sink): + raise NotImplementedError() - while todo: - mods = todo.pop(0) - if not mods: - continue - elif len(mods) == 1: - retval = retval | mods[0].try_mod() - elif CompoundModification(mods).try_mod(): - # All modifications, together, worked. - retval = True - else: - # We can't do all of them at once. Try doing subsets of each - # half of the list: - n = len(mods) // 2 - todo.extend([mods[:n], mods[n:]]) + def filter(self, text): + fout = StringIO() + sink = WriteRCSFileSink(fout) + filter = self.get_filter_sink(sink) + cvs2svn_rcsparse.parse(StringIO(text), filter) + return fout.getvalue() - return retval + def get_subfilters(self): + return [] + + def output(self, f, prefix=''): + raise NotImplementedError() -def get_dirs(path): - for filename in os.listdir(path): - subdir = os.path.join(path, filename) - if os.path.isdir(subdir): - yield subdir +class DeleteTagRCSFileFilter(RCSFileFilter): + class Sink(FilterSink): + def __init__(self, sink, tagname): + FilterSink.__init__(self, sink) + self.tagname = tagname + + def define_tag(self, name, revision): + if name != self.tagname: + FilterSink.define_tag(self, name, revision) + def __init__(self, tagname): + self.tagname = tagname -def get_files(path): - for filename in os.listdir(path): - subdir = os.path.join(path, filename) - if os.path.isfile(subdir): - yield subdir + def get_size(self): + return 50 + def get_filter_sink(self, sink): + return self.Sink(sink, self.tagname) -def try_delete_subdirs(path): - """Try to delete subdirectories under PATH (recursively).""" + def output(self, f, prefix=''): + f.write('%sDeleted tag %r\n' % (prefix, self.tagname,)) - # First try to delete the subdirectories themselves: - mods = [ - DeleteDirectoryModification(subdir) - for subdir in get_dirs(path) - ] - try_modification_combinations(mods) - # Now recurse into any remaining subdirectories and do the same: - for subdir in get_dirs(path): - try_delete_subdirs(subdir) +def get_tag_set(path): + class TagCollector(cvs2svn_rcsparse.Sink): + def __init__(self): + self.tags = set() + + # A map { branch_tuple : name } for branches on which no + # revisions have yet been seen: + self.branches = {} + + def define_tag(self, name, revision): + revtuple = rev_tuple(revision) + if len(revtuple) % 2 == 0: + # This is a tag (as opposed to branch) + self.tags.add(name) + else: + self.branches[revtuple] = name + def define_revision( + self, revision, timestamp, author, state, branches, next + ): + branch = rev_tuple(revision)[:-1] + try: + del self.branches[branch] + except KeyError: + pass -def try_delete_files(path): - mods = [ - DeleteFileModification(filename) - for filename in get_files(path) + def get_tags(self): + tags = self.tags + for branch in self.branches.values(): + tags.add(branch) + return tags + + tag_collector = TagCollector() + cvs2svn_rcsparse.parse(open(path, 'rb'), tag_collector) + return tag_collector.get_tags() + + +class DeleteBranchTreeRCSFileFilter(RCSFileFilter): + class Sink(FilterSink): + def __init__(self, sink, branch_rev): + FilterSink.__init__(self, sink) + self.branch_rev = branch_rev + + def is_on_branch(self, revision): + revtuple = rev_tuple(revision) + return revtuple[:len(self.branch_rev)] == self.branch_rev + + def define_tag(self, name, revision): + if not self.is_on_branch(revision): + FilterSink.define_tag(self, name, revision) + + def define_revision( + self, revision, timestamp, author, state, branches, next + ): + if not self.is_on_branch(revision): + branches = [ + branch + for branch in branches + if not self.is_on_branch(branch) ] + FilterSink.define_revision( + self, revision, timestamp, author, state, branches, next + ) - try_modification_combinations(mods) + def set_revision_info(self, revision, log, text): + if not self.is_on_branch(revision): + FilterSink.set_revision_info(self, revision, log, text) + + def __init__(self, branch_rev, subbranch_tree): + self.branch_rev = branch_rev + self.subbranch_tree = subbranch_tree + + def get_size(self): + return 100 + + def get_filter_sink(self, sink): + return self.Sink(sink, self.branch_rev) + + def get_subfilters(self): + for (branch_rev, subbranch_tree) in self.subbranch_tree: + yield DeleteBranchTreeRCSFileFilter(branch_rev, subbranch_tree) - # Now recurse into any remaining subdirectories and do the same: - for subdir in get_dirs(path): - try_delete_files(subdir) + def output(self, f, prefix=''): + f.write( + '%sDeleted branch %s\n' + % (prefix, '.'.join([str(s) for s in self.branch_rev]),) + ) -cvsrepo = sys.argv[1] -test_command = sys.argv[2:] +def get_branch_tree(path): + """Return the forest of branches in path. + Return [(branch_revision, [sub_branch, ...]), ...], where + branch_revision is a revtuple and sub_branch has the same form as + the whole return value. + + """ + + class BranchCollector(cvs2svn_rcsparse.Sink): + def __init__(self): + self.branches = {} + + def define_revision( + self, revision, timestamp, author, state, branches, next + ): + parent = rev_tuple(revision)[:-1] + if len(parent) == 1: + parent = (1,) + entry = self.branches.setdefault(parent, []) + for branch in branches: + entry.append(rev_tuple(branch)[:-1]) + + def _get_subbranches(self, parent): + retval = [] + try: + branches = self.branches[parent] + except KeyError: + return [] + del self.branches[parent] + for branch in branches: + subbranches = self._get_subbranches(branch) + retval.append((branch, subbranches,)) + return retval + + def get_branches(self): + retval = self._get_subbranches((1,)) + assert not self.branches + return retval + + branch_collector = BranchCollector() + cvs2svn_rcsparse.parse(open(path, 'rb'), branch_collector) + return branch_collector.get_branches() + + +class RCSFileModification(Modification): + """A Modification that involves changing the contents of an RCS file.""" + + def __init__(self, path, filters): + self.path = path + self.filters = filters[:] + self.size = 0 + for filter in self.filters: + self.size += filter.get_size() + + def get_size(self): + return self.size + + def modify(self): + self.tempfile = get_tmp_filename() + shutil.move(self.path, self.tempfile) + text = open(self.tempfile, 'rb').read() + for filter in self.filters: + text = filter.filter(text) + open(self.path, 'wb').write(text) + + def revert(self): + shutil.move(self.tempfile, self.path) + self.tempfile = None + + def commit(self): + os.remove(self.tempfile) + self.tempfile = None + + def get_submodifications(self, success): + if success: + # All filters completed successfully; no need to try + # subsets: + pass + elif len(self.filters) == 1: + # The last filter failed; see if it has any subfilters: + subfilters = list(self.filters[0].get_subfilters()) + if subfilters: + yield RCSFileModification(self.path, subfilters) + else: + n = len(self.filters) // 2 + yield SplitModification( + RCSFileModification(self.path, self.filters[:n]), + RCSFileModification(self.path, self.filters[n:]) + ) + + def output(self, f, prefix=''): + f.write('%sModified file %r\n' % (prefix, self.path,)) + for filter in self.filters: + filter.output(f, prefix=(prefix + ' ')) + + def __str__(self): + return 'RCSFileModification(%r)' % (self.filters,) + + +def try_modification_combinations(test_command, mods): + """Try MOD and its submodifications. + + Return True if any modifications were successful.""" + + # A list of lists of modifications that should still be tried: + todo = list(mods) + + while todo: + todo.sort(key=lambda mod: mod.get_size()) + mod = todo.pop() + success = mod.try_mod(test_command) + # Now add possible submodifications to the list of things to try: + todo.extend(mod.get_submodifications(success)) + + +def get_dirs(path): + filenames = os.listdir(path) + filenames.sort() + for filename in filenames: + subpath = os.path.join(path, filename) + if os.path.isdir(subpath): + yield subpath + + +def get_files(path, recurse=False): + filenames = os.listdir(path) + filenames.sort() + for filename in filenames: + subpath = os.path.join(path, filename) + if os.path.isfile(subpath): + yield subpath + elif recurse and os.path.isdir(subpath): + for x in get_files(subpath, recurse=recurse): + yield x + + +def shrink_repository(test_command, cvsrepo): + try_modification_combinations( + test_command, [DeleteDirectoryModification(cvsrepo)] + ) + + # Try deleting branches: + mods = [] + for path in get_files(cvsrepo, recurse=True): + branch_tree = get_branch_tree(path) + if branch_tree: + filters = [] + for (branch_revision, subbranch_tree) in branch_tree: + filters.append( + DeleteBranchTreeRCSFileFilter( + branch_revision, subbranch_tree + ) + ) + mods.append(RCSFileModification(path, filters)) + if mods: + try_modification_combinations(test_command, mods) + + # Try deleting tags: + mods = [] + for path in get_files(cvsrepo, recurse=True): + tags = list(get_tag_set(path)) + if tags: + tags.sort() + filters = [DeleteTagRCSFileFilter(tag) for tag in tags] + mods.append(RCSFileModification(path, filters)) + + if mods: + try_modification_combinations(test_command, mods) + + +first_fail_message = """\ +ERROR! The test command failed with the original repository. The +test command should be designed so that it succeeds (indicating that +the bug is still present) with the original repository, and fails only +after the bug disappears. Please fix your test command and start +again. +""" + + +def main(): + try: + opts, args = getopt.getopt(sys.argv[1:], 'h', [ + 'skip-initial-test', + 'help', + ]) + except getopt.GetoptError, e: + sys.stderr.write('Unknown option: %s\n' % (e,)) + usage() + sys.exit(1) + + skip_initial_test = False + + for opt, value in opts: + if opt in ['--skip-initial-test']: + skip_initial_test = True + elif opt in ['-h', '--help']: + usage(sys.stdout) + sys.exit(0) + else: + sys.exit('Internal error') + + cvsrepo = args[0] + + def test_command(): + command(*args[1:]) if not os.path.isdir(tmpdir): os.makedirs(tmpdir) - + if not skip_initial_test: # Verify that test_command succeeds with the original repository: try: - command(*test_command) + test_command() except CommandFailedException, e: - sys.stderr.write( - 'ERROR! The test command failed with the original repository.\n' - 'The test command should be designed so that it succeeds\n' - '(indicating that the bug is still present) with the original\n' - 'repository, and fails only after the bug disappears.\n' - 'Please fix your test command and start again.\n' - ) + sys.stderr.write(first_fail_message) sys.exit(1) + sys.stdout.write( + 'The bug is confirmed to exist in the initial repository.\n' + ) + + try: + try: + shrink_repository(test_command, cvsrepo) + except KeyboardInterrupt: + pass + finally: + try: + os.rmdir(tmpdir) + except Exception, e: + sys.stderr.write('ERROR: %s (ignored)\n' % (e,)) + + +if __name__ == '__main__': + main() + -try_delete_subdirs(cvsrepo) -try_delete_files(cvsrepo) diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn cvs2svn-2.0.0/cvs2svn --- cvs2svn-1.5.x/cvs2svn 2007-01-28 22:52:33.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn 2007-08-15 22:53:54.000000000 +0200 @@ -2,7 +2,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -15,8 +15,6 @@ # history and logs, available at http://cvs2svn.tigris.org/. # ==================================================================== -VERSION = '1.5.1' - import sys # Make sure this Python is recent enough. Do this as early as possible, @@ -25,93 +23,12 @@ if sys.hexversion < 0x02020000: sys.stderr.write("ERROR: Python 2.2 or higher required.\n") sys.exit(1) -import os -import errno - -try: - # Try to get access to a bunch of encodings for use with --encoding. - # See http://cjkpython.i18n.org/ for details. - import iconv_codec -except ImportError: - pass - -from cvs2svn_lib.boolean import * from cvs2svn_lib.common import FatalException -from cvs2svn_lib.common import FatalError -from cvs2svn_lib.run_options import RunOptions -from cvs2svn_lib.context import Ctx -from cvs2svn_lib.pass_manager import PassManager -from cvs2svn_lib import passes - - -Ctx().VERSION = VERSION - -pass_manager = PassManager([ - passes.CollectRevsPass(), - passes.CollateSymbolsPass(), - passes.ResyncRevsPass(), - passes.SortRevsPass(), - passes.CreateDatabasesPass(), - passes.AggregateRevsPass(), - passes.SortSymbolsPass(), - passes.IndexSymbolsPass(), - passes.OutputPass(), - ]) - - -def main(): - # Convenience var, so we don't have to keep instantiating this Borg. - ctx = Ctx() - - run_options = RunOptions(pass_manager) - - # Make sure the tmp directory exists. Note that we don't check if - # it's empty -- we want to be able to use, for example, "." to hold - # tempfiles. But if we *did* want check if it were empty, we'd do - # something like os.stat(ctx.tmpdir)[stat.ST_NLINK], of course :-). - if not os.path.exists(ctx.tmpdir): - os.mkdir(ctx.tmpdir) - elif not os.path.isdir(ctx.tmpdir): - raise FatalError( - "cvs2svn tried to use '%s' for temporary files, but that path\n" - " exists and is not a directory. Please make it be a directory,\n" - " or specify some other directory for temporary files." - % (ctx.tmpdir,)) - - # But do lock the tmpdir, to avoid process clash. - try: - os.mkdir(os.path.join(ctx.tmpdir, 'cvs2svn.lock')) - except OSError, e: - if e.errno == errno.EACCES: - raise FatalError("Permission denied:" - + " No write access to directory '%s'." % ctx.tmpdir) - if e.errno == errno.EEXIST: - raise FatalError( - "cvs2svn is using directory '%s' for temporary files, but\n" - " subdirectory '%s/cvs2svn.lock' exists, indicating that another\n" - " cvs2svn process is currently using '%s' as its temporary\n" - " workspace. If you are certain that is not the case,\n" - " then remove the '%s/cvs2svn.lock' subdirectory." - % (ctx.tmpdir, ctx.tmpdir, ctx.tmpdir, ctx.tmpdir,)) - raise - - try: - if run_options.profiling: - import hotshot - prof = hotshot.Profile('cvs2svn.hotshot') - prof.runcall( - pass_manager.run, run_options.start_pass, run_options.end_pass) - prof.close() - else: - pass_manager.run(run_options.start_pass, run_options.end_pass) - finally: - try: os.rmdir(os.path.join(ctx.tmpdir, 'cvs2svn.lock')) - except: pass +from cvs2svn_lib.main import main -if __name__ == '__main__': try: - main() + main(sys.argv[0], sys.argv[1:]) except FatalException, e: sys.stderr.write(str(e)) sys.exit(1) diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn-example.options cvs2svn-2.0.0/cvs2svn-example.options --- cvs2svn-1.5.x/cvs2svn-example.options 2006-10-03 16:43:32.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn-example.options 2007-08-15 22:53:54.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2006 CollabNet. All rights reserved. +# Copyright (c) 2006-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -14,12 +14,36 @@ # history and logs, available at http://cvs2svn.tigris.org/. # ==================================================================== -# An options file like this can be used to configure cvs2svn. The -# file is in Python syntax, but you don't need to know Python to +# ##################### +# ## PLEASE READ ME! ## +# ##################### +# +# This is a template for an options file that can be used to configure +# cvs2svn. Many options do not have defaults, so it is easier to copy +# this file and modify what you need rather than creating a new +# options file from scratch. +# +# This file is in Python syntax, but you don't need to know Python to # modify it. But if you *do* know Python, then you will be happy to # know that you can use arbitary Python constructs to do fancy # configuration tricks. - +# +# But please be aware of the following: +# +# * In many places, leading whitespace is significant in Python (it is +# used instead of curly braces to group statements together). +# Therefore, if you don't know what you are doing, it is best to +# leave the whitespace as it is. +# +# * In normal strings, Python uses backslashes ("\") are used as an +# escape character. Therefore you need to be careful, especially +# when specifying regular expressions or Windows filenames. It is +# recommended that you use "raw strings" for these cases. +# Backslashes in raw strings are treated literally. A raw string is +# written by prefixing an "r" character to a string. Example: +# +# ctx.sort_executable = r'c:\windows\system32\sort.exe' +# # Two identifiers will have been defined before this file is executed, # and can be used freely within this file: # @@ -36,11 +60,15 @@ import re from cvs2svn_lib.boolean import * from cvs2svn_lib import config +from cvs2svn_lib.common import UTF8Encoder from cvs2svn_lib.log import Log from cvs2svn_lib.project import Project from cvs2svn_lib.output_option import DumpfileOutputOption from cvs2svn_lib.output_option import ExistingRepositoryOutputOption from cvs2svn_lib.output_option import NewRepositoryOutputOption +from cvs2svn_lib.revision_reader import RCSRevisionReader +from cvs2svn_lib.revision_reader import CVSRevisionReader +from cvs2svn_lib.checkout_internal import InternalRevisionReader from cvs2svn_lib.symbol_strategy import AllBranchRule from cvs2svn_lib.symbol_strategy import AllTagRule from cvs2svn_lib.symbol_strategy import BranchIfCommitsRule @@ -52,15 +80,15 @@ from cvs2svn_lib.symbol_strategy import from cvs2svn_lib.symbol_strategy import UnambiguousUsageRule from cvs2svn_lib.symbol_transform import RegexpSymbolTransform from cvs2svn_lib.property_setters import AutoPropsPropertySetter -from cvs2svn_lib.property_setters import BinaryFileDefaultMimeTypeSetter -from cvs2svn_lib.property_setters import BinaryFileEOLStyleSetter +from cvs2svn_lib.property_setters import CVSBinaryFileDefaultMimeTypeSetter +from cvs2svn_lib.property_setters import CVSBinaryFileEOLStyleSetter from cvs2svn_lib.property_setters import CVSRevisionNumberSetter from cvs2svn_lib.property_setters import DefaultEOLStyleSetter from cvs2svn_lib.property_setters import EOLStyleFromMimeTypeSetter from cvs2svn_lib.property_setters import ExecutablePropertySetter from cvs2svn_lib.property_setters import KeywordsPropertySetter from cvs2svn_lib.property_setters import MimeMapper - +from cvs2svn_lib.property_setters import SVNBinaryFileKeywordsPropertySetter # To choose the level of logging output, uncomment one of the # following lines: @@ -68,6 +96,7 @@ from cvs2svn_lib.property_setters import #Log().log_level = Log.QUIET Log().log_level = Log.NORMAL #Log().log_level = Log.VERBOSE +#Log().log_level = Log.DEBUG # There are several possible options for where to put the output of a @@ -82,7 +111,7 @@ Log().log_level = Log.NORMAL # third (optional) argument can be specified to set the # --bdb-txn-nosync option on a bdb repository: ctx.output_option = NewRepositoryOutputOption( - 'svnrepo', # Path to repository + r'/path/to/svnrepo', # Path to repository #fs_type='fsfs', # Type of repository to create #bdb_txn_nosync=False, # For bsd repositories, this option can be added ) @@ -92,14 +121,14 @@ ctx.output_option = NewRepositoryOutputO # The argument is the filesystem path of an existing local SVN # repository (this repository must already exist): #ctx.output_option = ExistingRepositoryOutputOption( -# 'svnrepo', # Path to repository +# r'/path/to/svnrepo', # Path to repository # ) # Use this type of output option if you want the output of the # conversion to be written to a SVN dumpfile instead of committing # them into an actual repository: #ctx.output_option = DumpfileOutputOption( -# dumpfile_path='cvs2svn-dump', # Name of dumpfile to create +# dumpfile_path=r'/path/to/cvs2svn-dump', # Name of dumpfile to create # ) @@ -107,18 +136,48 @@ ctx.output_option = NewRepositoryOutputO # can be set to True to suppress cvs2svn output altogether: ctx.dry_run = False -# Change the following if cvs2svn should use "cvs" to read file -# versions out of *,v files. (The default is to use "co", which is -# part of RCS, and is much faster): -ctx.use_cvs = False - -# Set the name (and optionally the path) of some executables required -# by cvs2svn. 'co' is needed by default; 'cvs' is needed if -# ctx.use_cvs is set to True: -ctx.svnadmin_executable = 'svnadmin' -ctx.co_executable = 'co' -ctx.cvs_executable = 'cvs' -ctx.sort_executable = 'sort' +# The following option specifies how the revision contents of the RCS +# files should be read. +# +# The default selection is InternalRevisionReader, which uses built-in +# code that reads the RCS deltas while parsing the files in +# CollectRevsPass. This method is very fast but requires lots of +# temporary disk space. The disk space is required for (1) storing +# all of the RCS deltas, and (2) during OutputPass, keeping a copy of +# the full text of every revision that still has a descendant that +# hasn't yet been committed. Since this can includes multiple +# revisions of each file (i.e., on multiple branches), the required +# amount of temporary space can potentially be many times the size of +# a checked out copy of the whole repository. Setting compress=True +# cuts the disk space requirements by about 50% at the price of +# increased CPU usage. Using compression usually speeds up the +# conversion due to the reduced I/O pressure, unless --tmpdir is on a +# RAM disk. This method does not expand CVS's "Log" keywords. +# +# The second possibility is RCSRevisionReader, which uses RCS's "co" +# program to extract the revision contents of the RCS files during +# OutputPass. This option doesn't require any temporary space, but it +# is relatively slow because (1) "co" has to be executed very many +# times; and (2) "co" itself has to assemble many file deltas to +# compute the contents of a particular revision. The constructor +# argument specifies how to invoke the "co" executable. +# +# The third possibility is CVSRevisionReader, which uses the "cvs" +# program to extract the revision contents out of the RCS files during +# OutputPass. This option doesn't require any temporary space, but it +# is the slowest of all, because "cvs" is considerably slower than +# "co". However, it works in some situations where RCSRevisionReader +# fails; see the HTML documentation of the "--use-cvs" option for +# details. The constructor argument specifies how to invoke the "co" +# executable. +ctx.revision_reader = InternalRevisionReader(compress=True) +#ctx.revision_reader = RCSRevisionReader(co_executable=r'co') +#ctx.revision_reader = CVSRevisionReader(cvs_executable=r'cvs') + +# Set the name (and optionally the path) of some other executables +# required by cvs2svn: +ctx.svnadmin_executable = r'svnadmin' +ctx.sort_executable = r'sort' # Change the following line to True if the conversion should only # include the trunk of the repository (i.e., all branches and tags @@ -129,22 +188,32 @@ ctx.trunk_only = False # directory once the last file has been deleted from it: ctx.prune = False -# A list of encodings that should be tried when converting filenames, -# author names, log messages, etc. to UTF8. The encoders are tried in -# order in 'strict' mode until one of them succeeds. If none -# succeeds, then ctx.fallback_encoding is used in lossy 'replace' mode -# (if it is configured): -ctx.encoding = [ +# How to converting author names and log messages to UTF8. The first +# argument is a list of encoders that are tried in order in 'strict' +# mode until one of them succeeds. If none of those succeeds, then +# fallback_encoding is used in lossy 'replace' mode (if it is +# specified). Setting a fallback encoder ensures that the encoder +# always succeeds, but it can cause information loss. +ctx.utf8_encoder = UTF8Encoder( + [ #'latin1', + #'utf8', 'ascii', - ] + ], + #fallback_encoder='ascii' + ) -# The encoding to use if all of the encodings listed in ctx.encoding -# fail. This encoding is used in 'replace' mode, which always -# succeeds but can cause information loss. To enable this feature, -# set the following value to the name of the desired encoding (e.g., -# 'ascii'). -ctx.fallback_encoding = None +# The following encoder is used to convert filenames to UTF8. See the +# documentation for ctx.utf8_encoder for more information. You might +# want to set this encoder stricter than ctx.utf8_encoder. +ctx.filename_utf8_encoder = UTF8Encoder( + [ + #'latin1', + #'utf8', + 'ascii', + ], + #fallback_encoder='ascii' + ) # The basic strategy for converting symbols (this should usually be # left unchanged). A CVS symbol might be used as a tag in one file @@ -160,15 +229,15 @@ ctx.symbol_strategy = RuleBasedSymbolStr # To force all symbols matching a regular expression to be converted # as branches, add rules like the following: -#ctx.symbol_strategy.add_rule(ForceBranchRegexpStrategyRule('branch.*')) +#ctx.symbol_strategy.add_rule(ForceBranchRegexpStrategyRule(r'branch.*')) # To force all symbols matching a regular expression to be converted # as tags, add rules like the following: -#ctx.symbol_strategy.add_rule(ForceTagRegexpStrategyRule('tag.*')) +#ctx.symbol_strategy.add_rule(ForceTagRegexpStrategyRule(r'tag.*')) # To force all symbols matching a regular expression to be excluded # from the conversion, add rules like the following: -#ctx.symbol_strategy.add_rule(ExcludeRegexpStrategyRule('unknown-.*')) +#ctx.symbol_strategy.add_rule(ExcludeRegexpStrategyRule(r'unknown-.*')) # Usually you want this rule, to convert unambiguous symbols (symbols # that were only ever used as tags or only ever used as branches in @@ -194,7 +263,9 @@ ctx.symbol_strategy.add_rule(Unambiguous # often as branches or tags: #ctx.symbol_strategy.add_rule(HeuristicStrategyRule()) -# Specify a username to be used for commits generated by cvs2svn. If this options is set to None then no username will be used for such commits: +# Specify a username to be used for commits generated by cvs2svn. If +# this options is set to None then no username will be used for such +# commits: ctx.username = None #ctx.username = 'cvs2svn' @@ -203,77 +274,121 @@ ctx.username = None # the rules are tried one by one. Any rule can add or suppress one or # more svn properties. Typically the rules will not overwrite # properties set by a previous rule (though they are free to do so). -ctx.svn_property_setters = [ - # Set the svn:executable flag on any files that are marked in CVS as - # being executable: - ExecutablePropertySetter(), - - # Omit the svn:eol-style property from any files that are listed as - # binary in CVS: - BinaryFileEOLStyleSetter(), - - # To read mime types from a file, uncomment the following line and - # specify a filename: - #MimeMapper('/etc/mime.types'), - +ctx.svn_property_setters.extend([ # To read auto-props rules from a file, uncomment the following line # and specify a filename. The boolean argument specifies whether # case should be ignored when matching filenames to the filename # patterns found in the auto-props file: #AutoPropsPropertySetter( - # '/home/username/.subversion/config', - # False, + # r'/home/username/.subversion/config', + # ignore_case=True, # ), + # To read mime types from a file, uncomment the following line and + # specify a filename: + #MimeMapper(r'/etc/mime.types'), + + # Omit the svn:eol-style property from any files that are listed + # as binary (i.e., mode '-kb') in CVS: + CVSBinaryFileEOLStyleSetter(), + # If the file is binary and its svn:mime-type property is not yet # set, set svn:mime-type to 'application/octet-stream'. - BinaryFileDefaultMimeTypeSetter(), + CVSBinaryFileDefaultMimeTypeSetter(), # To try to determine the eol-style from the mime type, uncomment # the following line: #EOLStyleFromMimeTypeSetter(), - # Choose one of the following lines to set the default svn:eol-style - # if none of the above rules applied. The argument is the - # svn:eol-style that should be applied, or None if no svn:eol-style - # should be set. - #DefaultEOLStyleSetter(None) - DefaultEOLStyleSetter('native'), + # Choose one of the following lines to set the default + # svn:eol-style if none of the above rules applied. The argument + # is the svn:eol-style that should be applied, or None if no + # svn:eol-style should be set (i.e., the file should be treated as + # binary). + # + # The default is to treat all files as binary unless one of the + # previous rules has determined otherwise, because this is the + # safest approach. However, if you have been diligent about + # marking binary files with -kb in CVS and/or you have used the + # above rules to definitely mark binary files as binary, then you + # might prefer to use 'native' as the default, as it is usually + # the most convenient setting for text files. Other possible + # options: 'CRLF', 'CR', 'LF'. + DefaultEOLStyleSetter(None), + #DefaultEOLStyleSetter('native'), + + # Prevent svn:keywords from being set on files that have + # svn:eol-style unset. + SVNBinaryFileKeywordsPropertySetter(), # If svn:keywords has not been set yet, set it based on the file's # CVS mode: KeywordsPropertySetter(config.SVN_KEYWORDS_VALUE), + # Set the svn:executable flag on any files that are marked in CVS as + # being executable: + ExecutablePropertySetter(), + # Uncomment the following line to include the original CVS revision # numbers as file properties in the SVN archive: #CVSRevisionNumberSetter(), - ] + ]) # The directory to use for temporary files: -ctx.tmpdir = 'tmp' +ctx.tmpdir = r'cvs2svn-tmp' # To skip the cleanup of temporary files, uncomment the following # option: #ctx.skip_cleanup = True + +# In CVS, it is perfectly possible to make a single commit that +# affects more than one project or more than one branch of a single +# project. Subversion also allows such commits. Therefore, by +# default, when cvs2svn sees what looks like a cross-project or +# cross-branch CVS commit, it converts it into a +# cross-project/cross-branch Subversion commit. +# +# However, other tools and SCMs have trouble representing +# cross-project or cross-branch commits. (For example, Trac's Revtree +# plugin, http://www.trac-hacks.org/wiki/RevtreePlugin is confused by +# such commits.) Therefore, we provide the following two options to +# allow cross-project/cross-branch commits to be suppressed. + # To prevent CVS commits from different projects from being merged # into single SVN commits, change this option to False: ctx.cross_project_commits = True +# To prevent CVS commits on different branches from being merged into +# single SVN commits, change this option to False: +ctx.cross_branch_commits = True + + +# By default, it is a fatal error for a CVS ",v" file to appear both +# inside and outside of an "Attic" subdirectory (this should never +# happen, but frequently occurs due to botched repository +# administration). If you would like to retain both versions of such +# files, change the following option to True, and the attic version of +# the file will be left in an SVN subdirectory called "Attic": +ctx.retain_conflicting_attic_files = False + # Now use stanzas like the following to define CVS projects that # should be converted. The arguments are: # # - The filesystem path of the project within the CVS repository. # # - The path that should be used for the "trunk" directory of this -# project within the SVN repository. +# project within the SVN repository. This is an SVN path, so it +# should always use forward slashes ("/"). # # - The path that should be used for the "branches" directory of this -# project within the SVN repository. +# project within the SVN repository. This is an SVN path, so it +# should always use forward slashes ("/"). # # - The path that should be used for the "tags" directory of this -# project within the SVN repository. +# project within the SVN repository. This is an SVN path, so it +# should always use forward slashes ("/"). # # - A list of symbol transformations that can be used to rename # symbols in this project. Each entry is a tuple (pattern, @@ -285,19 +400,19 @@ ctx.cross_project_commits = True # r'\1' or r'\g<name>'). Typically you will want to use raw strings # (strings with a preceding 'r', like shown in the examples) for the # regexp and its replacement to avoid backslash substitution within -# those strings.""" +# those strings. # Create the default project (using ctx.trunk, ctx.branches, and ctx.tags): ctx.add_project( Project( - 'test-data/main-cvsrepos', + r'test-data/main-cvsrepos', 'trunk', 'branches', 'tags', symbol_transforms=[ - #RegexpSymbolTransform(r'^release-(\d+)_(\d+)$', + #RegexpSymbolTransform(r'release-(\d+)_(\d+)', # r'release-\1.\2'), - #RegexpSymbolTransform(r'^release-(\d+)_(\d+)_(\d+)$', + #RegexpSymbolTransform(r'release-(\d+)_(\d+)_(\d+)', # r'release-\1.\2.\3'), ], ) @@ -307,14 +422,14 @@ ctx.add_project( # and projA/tags: #ctx.add_project( # Project( -# 'my/cvsrepo/projA', +# r'my/cvsrepo/projA', # 'projA/trunk', # 'projA/branches', # 'projA/tags', # symbol_transforms=[ -# #RegexpSymbolTransform(r'^release-(\d+)_(\d+)$', +# #RegexpSymbolTransform(r'release-(\d+)_(\d+)', # # r'release-\1.\2'), -# #RegexpSymbolTransform(r'^release-(\d+)_(\d+)_(\d+)$', +# #RegexpSymbolTransform(r'release-(\d+)_(\d+)_(\d+)', # # r'release-\1.\2.\3'), # ], # ) diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn.1 cvs2svn-2.0.0/cvs2svn.1 --- cvs2svn-1.5.x/cvs2svn.1 2006-10-03 16:43:32.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn.1 2007-08-15 22:53:54.000000000 +0200 @@ -5,74 +5,68 @@ cvs2svn \- convert a cvs repository into a subversion repository .SH SYNOPSIS .B cvs2svn -[\fIOPTION\fR]... \fI-s svn-repos-path cvs-repos-path\fR +[\fIOPTION\fR]... \fIOUTPUT-OPTION CVS-REPOS-PATH\fR .br .B cvs2svn -[\fIOPTION\fR]... \fI--dumpfile=path cvs-repos-path\fR +[\fIOPTION\fR]... \fI--options=PATH\fR .SH DESCRIPTION Create a new Subversion repository based on the version history stored in a CVS repository. Each CVS commit will be mirrored in the Subversion repository, including such information as date of commit and id of the committer. -.TP -\fB--help\fR, \fB-h\fR -Print the usage message and exit with success. -.TP -\fB--help-passes\fR -Print the numbers and names of the conversion passes and exit with -success. -.TP -\fB--version\fR -Print the version number. -.TP -\fB--verbose\fR, \fB-v\fR -Print more information while running. -.TP -\fB--quiet\fR, \fB-q\fR -Print less information while running. This option may be specified -twice to suppress all non-error output. +.P +\fICVS-REPOS-PATH\fR is the filesystem path of the part of the CVS +repository that you want to convert. It is not possible to convert a +CVS repository to which you only have remote access; see the FAQ for +more information. This path doesn't have to be the top level +directory of a CVS repository; it can point at a project within a +repository, in which case only that project will be converted. This +path or one of its parent directories has to contain a subdirectory +called CVSROOT (though the CVSROOT directory can be empty). +.P +Multiple CVS repositories can be converted into a single Subversion +repository in a single run of cvs2svn, but only by using an +\fB--options\fR file. +.SH "OPTIONS FILE" .TP \fB--options\fR=\fIpath\fR Read the conversion options from \fIpath\fR instead of from the -command line. See the documentation for more information. Only the -following options are allowed in combination with \fB--options\fR: -\fB-h\fR/\fB--help\fR, \fB--help-passes\fR, \fB--version\fR, -\fB-v\fR/\fB--verbose\fR, \fB-q\fR/\fB--quiet\fR, \fB-p\fR, +command line. This option allows far more conversion flexibility than +can be achieved using the command-line alone. See the documentation +for more information. Only the following command-line options are +allowed in combination with \fB--options\fR: \fB-h\fR/\fB--help\fR, +\fB--help-passes\fR, \fB--version\fR, \fB-v\fR/\fB--verbose\fR, +\fB-q\fR/\fB--quiet\fR, \fB-p\fR/\fB--pass\fR/\fB--passes\fR, \fB--dry-run\fR, and \fB--profile\fR. +.SH "OUTPUT OPTIONS" .TP -\fB-s\fR \fIpath\fR -Load CVS repository into the Subversion repository located at PATH. If there -is no Subversion repository at \fIpath\fR, create a new one. +\fB-s\fR, \fB--svnrepos\fR \fIpath\fR +Load CVS repository into the Subversion repository located at +\fIpath\fR. If there is no Subversion repository at \fIpath\fR, +create a new one. .TP -\fB-p\fR \fIpass\fR -Execute only pass \fIpass\fR of the conversion. \fIpass\fR can be -specified by name or by number (see \fB--help-passes\fR). +\fB--existing-svnrepos\fR +Load output into existing SVN repository, instead of creating a new +repository. The existing repository must either be empty or contain +no paths that conflict with paths that will be output by cvs2svn. +Please note that you need write permission for the repository files. .TP -\fB-p\fR [\fIstart\fR]:[\fIend\fR] -Execute passes \fIstart\fR through \fIend\fR of the conversion -(inclusive). \fIstart\fR and \fIend\fR can be specified by name or by -number (see \fB--help-passes\fR). If \fIstart\fR or \fIend\fR is -missing, it defaults to the first or last pass, respectively. For -this to work the earlier passes must have been completed before on the -same CVS repository, and the generated data files must be in the -temporary directory (see \fB--tmpdir\fR). +\fB--fs-type\fR +Pass \fI--fs-type\fR=\fItype\fR to "svnadmin create" when creating a +new repository. .TP -\fB--existing-svnrepos\fR -Load into existing SVN repository, instead of creating a new -repository. Please note that you need write permission for the -repository files. +\fB--bdb-txn-nosync\fR +Pass \fI--bdb-txn-nosync\fR to "svnadmin create" when creating a new +BDB-style Subversion repository. .TP \fB--dumpfile\fR=\fIpath\fR -Just produce a dumpfile; don't commit to an SVN repository. Use -\fIpath\fR as the name of the dumpfile. +Just produce a dumpfile; don't commit to an SVN repository. Write the +dumpfile to \fIpath\fR. .TP \fB--dry-run\fR Do not create a repository or a dumpfile; just print the details of what cvs2svn would do if it were really converting your repository. -.TP -\fB--use-cvs\fR -Use CVS instead of RCS 'co' to extract data (only use this if having -problems with RCS, as CVS is much slower). +.SH "CONVERSION OPTIONS" .TP \fB--trunk-only\fR Convert only trunk commits, not tags nor branches. @@ -90,15 +84,17 @@ Set the top-level path to use for tags i The default is \fItags\fR. .TP \fB--no-prune\fR -When all files are deleted from a directory in the Subversion repository, -don't delete the empty directory (the default is to delete any empty -directories. +When all files are deleted from a directory in the Subversion +repository, don't delete the empty directory (the default is to delete +any empty directories. .TP \fB--encoding\fR=\fIencoding\fR Use \fIencoding\fR as the encoding for filenames, log messages, and author names in the CVS repos. This option may be specified multiple times, in which case the encodings are tried in order -until one succeeds. Default: ascii. +until one succeeds. Default: ascii. See +http://docs.python.org/lib/standard-encodings.html for a list of other +standard encodings. .TP \fB--fallback-encoding\fR=\fIencoding\fR If all of the encodings specified with \fB--encoding\fR fail, then @@ -106,15 +102,24 @@ fall back to using \fIencoding\fR in los this option may cause information to be lost, but at least it allows the conversion to run to completion. Default: disabled. .TP +\fB--symbol-transform\fR=\fIpattern\fR:\fIreplacement\fR +Transform RCS/CVS symbol names before entering them into Subversion. +\fIpattern\fR is a Python regexp pattern that is matches against the +entire symbol name; \fIreplacement\fR is a replacement using Python's +regexp reference syntax. You may specify any number of these options; +they will be applied in the order given on the command line. +.TP \fB--force-branch\fR=\fIregexp\fR Force symbols whose names match \fIregexp\fR to be branches. +\fIregexp\fR must match the whole symbol name. .TP \fB--force-tag\fR=\fIregexp\fR -Force symbols whose names match \fIregexp\fR to be tags. +Force symbols whose names match \fIregexp\fR to be tags. \fIregexp\fR +must match the whole symbol name. .TP \fB--exclude\fR=\fIregexp\fR Exclude branches and tags whose names match \fIregexp\fR from the -conversion. +conversion. \fIregexp\fR must match the whole symbol name. .TP \fB--symbol-default\fR=\fIopt\fR Specify how to convert ambiguous symbols (those that appear in the CVS @@ -124,32 +129,23 @@ symbol as a tag), `heuristic' (decide ho symbol based on whether it was used more often as a branch/tag in CVS), or `strict' (no default; every ambiguous symbol has to be resolved manually using \fB--force-branch\fR, \fB--force-tag\fR, -or \fB--exclude\fR). +or \fB--exclude\fR). The default is `strict'. .TP -\fB--symbol-transform\fR=\fIpattern\fR:\fIreplacement\fR -Transform RCS/CVS symbol names before entering them into Subversion. -\fIpattern\fR is a Python regexp pattern and \fIreplacement\fR is a -replacement using Python's regexp reference syntax. You may specify any -number of these options; they will be applied in the order given on -the command line. -.br -This option can be useful if you're converting a repository in which the -developer used directory-wide symbol names like 1_0, 1_1 and 2_1 as a -kludgy form of release tagging (the C-x v s command in Emacs VC mode -encourages this practice). +\fB--no-cross-branch-commits\fR +Prevent the creation of SVN commits that affect multiple branches or +trunk and a branch. Instead, break such changesets into multiple +commits, one per branch. +.TP +\fB--retain-conflicting-attic-files\fR +If a file appears both inside an outside of the CVS attic, retain the +attic version in an SVN subdirectory called `Attic'. (Normally this +situation is treated as a fatal error.) .TP \fB--username\fR=\fIname\fR -Set the default username to \fIname\fR when cvs2svn doesn't have a username -from the CVS repository to work with. This happens when a branch or tag is -created. The default is to use no author at all for such commits. -.TP -\fB--fs-type\fR -Pass \fI--fs-type\fR=\fItype\fR to "svnadmin create" when creating a -new repository. -.TP -\fB--bdb-txn-nosync\fR -Pass \fI--bdb-txn-nosync\fR to "svnadmin create" when creating a new -repository. +Set the default username to \fIname\fR when cvs2svn needs to generate +a commit for which CVS does not record the original username. This +happens when a branch or tag is created. The default is to use no +author at all for such commits. .TP \fB--cvs-revnums\fR Record CVS revision numbers as file properties in the Subversion @@ -161,18 +157,6 @@ file is changed within Subversion.) Specify an apache-style mime.types \fIfile\fR for setting svn:mime-type. .TP -\fB--auto-props\fR=\fIfile\fR -Specify a file in the format of Subversion's config file, whose -[auto-props] section can be used to set arbitrary properties on files -in the Subversion repository based on their filenames. (The -[auto-props] section header must be present; other sections of the -config file, including the enable-auto-props setting, are ignored.) -Filenames are matched to the filename patterns case-sensitively unless -the --auto-props-ignore-case option is specified. -.TP -\fB--auto-props-ignore-case\fR -Ignore case when pattern-matching auto-props patterns. -.TP \fB--eol-from-mime-type\fR For files that don't have the kb expansion mode but have a known mime type, set the eol-style based on the mime type. For such files, set @@ -181,47 +165,105 @@ leave it unset (i.e., no EOL translation unknown mime types are not affected by this option. This option has no effect unless the \fB--mime-types\fR option is also specified. .TP -\fB--no-default-eol\fR -Files that don't have the kb expansion mode and (if -\fB--eol-from-mime-type\fR is set) unknown mime type usually have -their svn:eol-style property to "native". If this option is -specified, such files are left with no eol-style (i.e., no EOL -translation). +\fB--auto-props\fR=\fIfile\fR +Specify a file in the format of Subversion's config file, whose +[auto-props] section can be used to set arbitrary properties on files +in the Subversion repository based on their filenames. (The +[auto-props] section header must be present; other sections of the +config file, including the enable-auto-props setting, are ignored.) +Filenames are matched to the filename patterns case-insensitively. +.TP +\fB--default-eol\fR=\fIstyle\fR +Set svn:eol-style to \fIstyle\fR for files that don't have the CVS +`kb' expansion mode and whose end-of-line translation mode hasn't been +determined by one of the other options. \fIstyle\fR must be `binary' +(default), `native', `CRLF', `LF', or `CR'. .TP \fB--keywords-off\fR By default, cvs2svn sets svn:keywords on CVS files to "author id date" -if the mode of the RCS file in question is either kv, kvl or not -kb. If you use the --keywords-off switch, cvs2svn will not set +if the mode of the RCS file in question is either kv, kvl or unset. +If you use the --keywords-off switch, cvs2svn will not set svn:keywords for any file. While this will not touch the keywords in the contents of your files, Subversion will not expand them. +.SH "EXTRACTION OPTIONS" .TP -\fB--tmpdir\fR=\fIpath\fR -Set the \fIpath\fR to use for temporary data. Default is the current -directory. +\fB--use-internal-co\fR +Use internal code to extract revision contents. This is up to 50% +faster than using \fB--use-rcs\fR, but needs a lot of disk space: +Roughly the size of your CVS repository plus the peak size of a +complete checkout of the repository with all branches that existed and +still had commits pending at a given time. This option is the +default. .TP -\fB--skip-cleanup\fR -Prevent the deletion of temporary files. +\fB--use-rcs\fR +Use RCS 'co' to extract revision contents. .TP -\fB--profile\fR -Profile with 'hotshot' (into file \fIcvs2svn.hotshot\fR). +\fB--use-cvs\fR +Use CVS to extract revision contents (only use this if having +problems with \fB--use-internal-co\fR or \fB--use-rcs\fR, as those +options are much faster). +.SH "ENVIRONMENT OPTIONS" +.TP +\fB--tmpdir\fR=\fIpath\fR +Set the \fIpath\fR to use for temporary data. Default is a directory +called \fIcvs2svn-tmp\fR under the current directory. .TP \fB--svnadmin\fR=\fIpath\fR -Path to the \fIsvnadmin\fR program. +Path to the \fIsvnadmin\fR program. (\fIsvnadmin\fR is needed when +the \fB-s\fR/\fB--svnrepos\fR output option is used.) .TP \fB--co\fR=\fIpath\fR -Path to the \fIco\fR program. (\fIco\fR is needed if -\fB--use-cvs\fR is not specified.) +Path to the \fIco\fR program. (\fIco\fR is needed if the +\fB--use-rcs\fR option is used.) .TP \fB--cvs\fR=\fIpath\fR -Path to the \fIcvs\fR program. (\fIcvs\fR is needed if -\fB--use-cvs\fR is specified.) +Path to the \fIcvs\fR program. (\fIcvs\fR is needed if the +\fB--use-cvs\fR option is used.) .TP \fB--sort\fR=\fIpath\fR Path to the GNU \fIsort\fR program. (cvs2svn requires GNU sort.) +.SH "PARTIAL CONVERSIONS" +.TP +\fB-p\fR, \fB--pass\fR \fIpass\fR +Execute only pass \fIpass\fR of the conversion. \fIpass\fR can be +specified by name or by number (see \fB--help-passes\fR). +.TP +\fB-p\fR, \fB--passes\fR [\fIstart\fR]:[\fIend\fR] +Execute passes \fIstart\fR through \fIend\fR of the conversion +(inclusive). \fIstart\fR and \fIend\fR can be specified by name or by +number (see \fB--help-passes\fR). If \fIstart\fR or \fIend\fR is +missing, it defaults to the first or last pass, respectively. For +this to work the earlier passes must have been completed before on the +same CVS repository, and the generated data files must be in the +temporary directory (see \fB--tmpdir\fR). +.SH "INFORMATION OPTIONS" +.TP +\fB--version\fR +Print the version number. +.TP +\fB-h\fR, \fB--help\fR +Print the usage message and exit with success. +.TP +\fB--help-passes\fR +Print the numbers and names of the conversion passes and exit with +success. +.TP +\fB-v\fR, \fB--verbose\fR +Print more information while running. This option may be specified +twice to output voluminous debugging information. +.TP +\fB-q\fR, \fB--quiet\fR +Print less information while running. This option may be specified +twice to suppress all non-error output. +.TP +\fB--skip-cleanup\fR +Prevent the deletion of temporary files. +.TP +\fB--profile\fR +Profile with 'hotshot' (into file \fIcvs2svn.hotshot\fR). .SH FILES -The current directory (or the directory specified by \fB--tmpdir\fR) -is used as scratch space for data files of the form -\fIcvs2svn-data.*\fR and a dumpfile named \fIcvs2svn-dump\fR. +A directory called \fIcvs2svn-tmp\fR (or the directory specified by +\fB--tmpdir\fR) is used as scratch space for temporary data files. .SH AUTHORS Main authors are: .br diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/.dired cvs2svn-2.0.0/cvs2svn_lib/.dired --- cvs2svn-1.5.x/cvs2svn_lib/.dired 2006-08-21 17:18:31.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/.dired 1970-01-01 01:00:00.000000000 +0100 @@ -1,4 +0,0 @@ -Local Variables: -dired-omit-files-p: t -dired-omit-extensions: (".pyc" ".pyo") -End: diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/Makefile cvs2svn-2.0.0/cvs2svn_lib/Makefile --- cvs2svn-1.5.x/cvs2svn_lib/Makefile 2006-06-08 10:47:12.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/Makefile 1970-01-01 01:00:00.000000000 +0100 @@ -1,11 +0,0 @@ -# This is a convenience Makefile, allowing "make" to be invoked in the -# cvs2svn_lib directory. It simply re-invokes make in the main -# project directory. - -.PHONY: all -all:: - $(MAKE) -C .. - -%: - $(MAKE) -C .. $@ - diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/artifact_manager.py cvs2svn-2.0.0/cvs2svn_lib/artifact_manager.py --- cvs2svn-1.5.x/cvs2svn_lib/artifact_manager.py 2006-06-19 21:18:35.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/artifact_manager.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -29,9 +29,7 @@ class Artifact: """An artifact that can be created, used across cvs2svn passes, then cleaned up.""" - def __init__(self, name): - self.name = name - + def __init__(self): # A set of passes that need this artifact. This field is # maintained by ArtifactManager. self._passes_needed = set() @@ -41,21 +39,21 @@ class Artifact: pass - def __str__(self): - return self.name - class TempFileArtifact(Artifact): """A temporary file that can be used across cvs2svn passes.""" def __init__(self, basename): - Artifact.__init__(self, basename) + Artifact.__init__(self) self.filename = Ctx().get_temp_filename(basename) def cleanup(self): Log().verbose("Deleting", self.filename) os.unlink(self.filename) + def __str__(self): + return 'Temporary file %r' % self.filename + class ArtifactNotActiveError(Exception): """An artifact was requested when no passes that have registered @@ -75,13 +73,18 @@ class ArtifactManager: To use this class: - - Call register_artifact() or register_temp_file() for all possible - artifacts (even those that should have been created by previous - cvs2svn runs). - - - Call register_artifact_needed() or register_temp_file_needed() for - any artifact that are needed by any pass (even those passes that - won't be executed during this cvs2svn run). + - Call artifact_manager[name] = artifact once for each known + artifact. + + - Call artifact_manager.creates(which_pass, name) to indicate that + WHICH_PASS is the pass that creates the artifact named NAME. + + - Call artifact_manager.uses(which_pass, name) to indicate that + WHICH_PASS needs to use the artifact named NAME. + + There are also helper methods register_temp_file(), + register_artifact_needed(), and register_temp_file_needed() which + combine some useful operations. Then, in pass order: @@ -116,39 +119,64 @@ class ArtifactManager: # A set of passes that are currently being executed. self._active_passes = set() - def register_artifact(self, artifact, which_pass): - """Register a new ARTIFACT for management by this class. - WHICH_PASS is the pass that creates ARTIFACT, and is also assumed - to need it. It is an error to registier the same artifact more - than once.""" - - assert artifact.name not in self._artifacts - self._artifacts[artifact.name] = artifact - self.register_artifact_needed(artifact.name, which_pass) + def __setitem__(self, name, artifact): + """Add ARTIFACT to the list of artifacts that we manage. - def register_temp_file(self, basename, which_pass): - """Register a temporary file with base name BASENAME as an - artifact. Return the filename of the temporary file.""" + Store it under NAME.""" - artifact = TempFileArtifact(basename) - self.register_artifact(artifact, which_pass) - return artifact.filename + assert name not in self._artifacts + self._artifacts[name] = artifact - def get_artifact(self, artifact_name): - """Return the artifact with the specified name. If the artifact - does not currently exist, raise a KeyError. If it is not - registered as being needed by one of the active passes, raise an - ArtifactNotActiveError.""" + def __getitem__(self, name): + """Return the artifact with the specified name. - artifact = self._artifacts[artifact_name] + If the artifact does not currently exist, raise a KeyError. If it + is not registered as being needed by one of the active passes, + raise an ArtifactNotActiveError.""" + + artifact = self._artifacts[name] for active_pass in self._active_passes: if artifact in self._pass_needs[active_pass]: # OK - break + return artifact else: - raise ArtifactNotActiveError(artifact_name) + raise ArtifactNotActiveError(name) - return artifact + def creates(self, which_pass, name): + """Register that WHICH_PASS creates the artifact named NAME. + + An artifact with this name must already have been registered.""" + + artifact = self._artifacts[name] + + # An artifact is automatically "needed" in the pass in which it is + # created: + self.uses(which_pass, name) + + def uses(self, which_pass, name): + """Register that WHICH_PASS uses the artifact named NAME. + + An artifact with this name must already have been registered.""" + + artifact = self._artifacts[name] + artifact._passes_needed.add(which_pass) + if which_pass in self._pass_needs: + self._pass_needs[which_pass].add(artifact) + else: + self._pass_needs[which_pass] = set([artifact]) + + def register_temp_file(self, basename, which_pass): + """Register a temporary file with base name BASENAME as an artifact. + + Return the filename of the temporary file.""" + + artifact = TempFileArtifact(basename) + self[basename] = artifact + self.creates(which_pass, basename) + return artifact.filename + + def get_artifact(self, artifact_name): + return self[artifact_name] def get_temp_file(self, basename): """Return the filename of the temporary file with the specified BASENAME. @@ -159,7 +187,8 @@ class ArtifactManager: return self.get_artifact(basename).filename def register_artifact_needed(self, artifact_name, which_pass): - """Register that WHICH_PASS needs the artifact named ARTIFACT_NAME. + """Register that WHICH_PASS uses the artifact named ARTIFACT_NAME. + An artifact with this name must already have been registered.""" artifact = self._artifacts[artifact_name] @@ -170,8 +199,10 @@ class ArtifactManager: self._pass_needs[which_pass] = set([artifact,]) def register_temp_file_needed(self, basename, which_pass): - """Register that the temporary file with base name BASENAME is - needed by WHICH_PASS.""" + """Register that a temporary file is needed by WHICH_PASS. + + Register that the temporary file with base name BASENAME is needed + by WHICH_PASS.""" self.register_artifact_needed(basename, which_pass) @@ -211,7 +242,9 @@ class ArtifactManager: self._active_passes.add(which_pass) def pass_continued(self, which_pass): - """WHICH_PASS, which has already been started, will be continued + """WHICH_PASS will be continued during the next program run. + + WHICH_PASS, which has already been started, will be continued during the next program run. Unregister any artifacts that would be cleaned up at the end of WHICH_PASS without actually cleaning them up.""" @@ -220,8 +253,9 @@ class ArtifactManager: self._unregister_artifacts(which_pass) def pass_done(self, which_pass): - """WHICH_PASS is done. Clean up all artifacts that are no longer - needed.""" + """WHICH_PASS is done. + + Clean up all artifacts that are no longer needed.""" self._active_passes.remove(which_pass) artifacts = self._unregister_artifacts(which_pass) @@ -237,20 +272,21 @@ class ArtifactManager: self._unregister_artifacts(which_pass) def check_clean(self): - """All passes have been processed. Output a warning messages if - all artifacts have not been accounted for. (This is mainly a - consistency check, that no artifacts were registered under - nonexistent passes.)""" + """All passes have been processed. + + Output a warning messages if all artifacts have not been accounted + for. (This is mainly a consistency check, that no artifacts were + registered under nonexistent passes.)""" - unclean_artifact_names = [ - artifact.name + unclean_artifacts = [ + str(artifact) for artifact in self._artifacts.values() if artifact._passes_needed] - if unclean_artifact_names: + if unclean_artifacts: Log().warn( 'INTERNAL: The following artifacts were not cleaned up:\n %s\n' - % ('\n '.join(unclean_artifact_names))) + % ('\n '.join(unclean_artifacts))) # The default ArtifactManager instance: diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/changeset.py cvs2svn-2.0.0/cvs2svn_lib/changeset.py --- cvs2svn-1.5.x/cvs2svn_lib/changeset.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/changeset.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,244 @@ +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2006-2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""Manage change sets.""" + + +from cvs2svn_lib.boolean import * +from cvs2svn_lib.set_support import * +from cvs2svn_lib.common import InternalError +from cvs2svn_lib.context import Ctx +from cvs2svn_lib.symbol import Branch +from cvs2svn_lib.symbol import Tag +from cvs2svn_lib.cvs_item import CVSRevision +from cvs2svn_lib.time_range import TimeRange +from cvs2svn_lib.changeset_graph_node import ChangesetGraphNode + + +class Changeset(object): + """A set of cvs_items that might potentially form a single change set.""" + + def __init__(self, id, cvs_item_ids): + self.id = id + self.cvs_item_ids = set(cvs_item_ids) + + def get_cvs_items(self): + """Return the set of CVSItems within this Changeset.""" + + return set(Ctx()._cvs_items_db.get_many(self.cvs_item_ids)) + + def create_graph_node(self, cvs_item_to_changeset_id): + """Return a ChangesetGraphNode for this Changeset.""" + + raise NotImplementedError() + + def create_split_changeset(self, id, cvs_item_ids): + """Return a Changeset with the specified contents. + + This method is only implemented for changesets that can be split. + The type of the new changeset should be the same as that of SELF, + and any other information from SELF should also be copied to the + new changeset.""" + + raise NotImplementedError() + + def __getstate__(self): + return (self.id, list(self.cvs_item_ids),) + + def __setstate__(self, state): + (self.id, cvs_item_ids,) = state + self.cvs_item_ids = set(cvs_item_ids) + + def __cmp__(self, other): + raise NotImplementedError() + + def __str__(self): + raise NotImplementedError() + + def __repr__(self): + return '%s [%s]' % ( + self, ', '.join(['%x' % id for id in self.cvs_item_ids]),) + + +class RevisionChangeset(Changeset): + """A Changeset consisting of CVSRevisions.""" + + _sort_order = 3 + + def create_graph_node(self, cvs_item_to_changeset_id): + time_range = TimeRange() + pred_ids = set() + succ_ids = set() + + for cvs_item in self.get_cvs_items(): + time_range.add(cvs_item.timestamp) + + for pred_id in cvs_item.get_pred_ids(): + pred_ids.add(cvs_item_to_changeset_id[pred_id]) + + for succ_id in cvs_item.get_succ_ids(): + succ_ids.add(cvs_item_to_changeset_id[succ_id]) + + return ChangesetGraphNode(self, time_range, pred_ids, succ_ids) + + def create_split_changeset(self, id, cvs_item_ids): + return RevisionChangeset(id, cvs_item_ids) + + def __cmp__(self, other): + return cmp(self._sort_order, other._sort_order) \ + or cmp(self.id, other.id) + + def __str__(self): + return 'RevisionChangeset<%x>' % (self.id,) + + +class OrderedChangeset(Changeset): + """A Changeset of CVSRevisions whose preliminary order is known. + + The first changeset ordering involves only RevisionChangesets, and + results in a full ordering of RevisionChangesets (i.e., a linear + chain of dependencies with the order consistent with the + dependencies). These OrderedChangesets form the skeleton for the + full topological sort that includes SymbolChangesets as well.""" + + _sort_order = 2 + + def __init__(self, id, cvs_item_ids, ordinal, prev_id, next_id): + Changeset.__init__(self, id, cvs_item_ids) + + # The order of this changeset among all OrderedChangesets: + self.ordinal = ordinal + + # The changeset id of the previous OrderedChangeset, or None if + # this is the first OrderedChangeset: + self.prev_id = prev_id + + # The changeset id of the next OrderedChangeset, or None if this + # is the last OrderedChangeset: + self.next_id = next_id + + def create_graph_node(self, cvs_item_to_changeset_id): + time_range = TimeRange() + + pred_ids = set() + succ_ids = set() + + if self.prev_id is not None: + pred_ids.add(self.prev_id) + + if self.next_id is not None: + succ_ids.add(self.next_id) + + for cvs_item in self.get_cvs_items(): + time_range.add(cvs_item.timestamp) + + for pred_id in cvs_item.get_symbol_pred_ids(): + pred_ids.add(cvs_item_to_changeset_id[pred_id]) + + for succ_id in cvs_item.get_symbol_succ_ids(): + succ_ids.add(cvs_item_to_changeset_id[succ_id]) + + return ChangesetGraphNode(self, time_range, pred_ids, succ_ids) + + def __getstate__(self): + return ( + Changeset.__getstate__(self), + self.ordinal, self.prev_id, self.next_id,) + + def __setstate__(self, state): + (changeset_state, self.ordinal, self.prev_id, self.next_id,) = state + Changeset.__setstate__(self, changeset_state) + + def __cmp__(self, other): + return cmp(self._sort_order, other._sort_order) \ + or cmp(self.id, other.id) + + def __str__(self): + return 'OrderedChangeset<%x(%d)>' % (self.id, self.ordinal,) + + +class SymbolChangeset(Changeset): + """A Changeset consisting of CVSSymbols.""" + + def __init__(self, id, symbol, cvs_item_ids): + Changeset.__init__(self, id, cvs_item_ids) + self.symbol = symbol + + def create_graph_node(self, cvs_item_to_changeset_id): + pred_ids = set() + succ_ids = set() + + for cvs_item in self.get_cvs_items(): + for pred_id in cvs_item.get_pred_ids(): + pred_ids.add(cvs_item_to_changeset_id[pred_id]) + + for succ_id in cvs_item.get_succ_ids(): + succ_ids.add(cvs_item_to_changeset_id[succ_id]) + + return ChangesetGraphNode(self, TimeRange(), pred_ids, succ_ids) + + def __cmp__(self, other): + return cmp(self._sort_order, other._sort_order) \ + or cmp(self.symbol, other.symbol) \ + or cmp(self.id, other.id) + + def __getstate__(self): + return (Changeset.__getstate__(self), self.symbol.id,) + + def __setstate__(self, state): + (changeset_state, symbol_id) = state + Changeset.__setstate__(self, changeset_state) + self.symbol = Ctx()._symbol_db.get_symbol(symbol_id) + + +class BranchChangeset(SymbolChangeset): + """A Changeset consisting of CVSBranches.""" + + _sort_order = 1 + + def create_split_changeset(self, id, cvs_item_ids): + return BranchChangeset(id, self.symbol, cvs_item_ids) + + def __str__(self): + return 'BranchChangeset<%x>("%s")' % (self.id, self.symbol,) + + +class TagChangeset(SymbolChangeset): + """A Changeset consisting of CVSTags.""" + + _sort_order = 0 + + def create_split_changeset(self, id, cvs_item_ids): + return TagChangeset(id, self.symbol, cvs_item_ids) + + def __str__(self): + return 'TagChangeset<%x>("%s")' % (self.id, self.symbol,) + + +def create_symbol_changeset(id, symbol, cvs_item_ids): + """Factory function for SymbolChangesets. + + Return a BranchChangeset or TagChangeset, depending on the type of + SYMBOL. SYMBOL must be a Branch or Tag.""" + + if isinstance(symbol, Branch): + return BranchChangeset(id, symbol, cvs_item_ids) + if isinstance(symbol, Tag): + return TagChangeset(id, symbol, cvs_item_ids) + else: + raise InternalError('Unknown symbol type %s' % (symbol,)) + + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/changeset_database.py cvs2svn-2.0.0/cvs2svn_lib/changeset_database.py --- cvs2svn-1.5.x/cvs2svn_lib/changeset_database.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/changeset_database.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,61 @@ +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2006-2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""This module contains classes to store changesets.""" + + +from __future__ import generators + +from cvs2svn_lib.boolean import * +from cvs2svn_lib.changeset import Changeset +from cvs2svn_lib.changeset import RevisionChangeset +from cvs2svn_lib.changeset import OrderedChangeset +from cvs2svn_lib.changeset import SymbolChangeset +from cvs2svn_lib.changeset import BranchChangeset +from cvs2svn_lib.changeset import TagChangeset +from cvs2svn_lib.record_table import UnsignedIntegerPacker +from cvs2svn_lib.record_table import RecordTable +from cvs2svn_lib.database import IndexedStore +from cvs2svn_lib.serializer import PrimedPickleSerializer + + +def CVSItemToChangesetTable(filename, mode): + return RecordTable(filename, mode, UnsignedIntegerPacker()) + + +class ChangesetDatabase(IndexedStore): + def __init__(self, filename, index_filename, mode): + primer = ( + Changeset, + RevisionChangeset, + OrderedChangeset, + SymbolChangeset, + BranchChangeset, + TagChangeset, + ) + IndexedStore.__init__( + self, filename, index_filename, mode, PrimedPickleSerializer(primer)) + + def store(self, changeset): + self.add(changeset) + + def keys(self): + return list(self.iterkeys()) + + def close(self): + IndexedStore.close(self) + + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/changeset_graph.py cvs2svn-2.0.0/cvs2svn_lib/changeset_graph.py --- cvs2svn-1.5.x/cvs2svn_lib/changeset_graph.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/changeset_graph.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,417 @@ +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2006-2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""The changeset dependency graph.""" + + +from __future__ import generators + +from cvs2svn_lib.boolean import * +from cvs2svn_lib.set_support import * +from cvs2svn_lib.context import Ctx +from cvs2svn_lib.log import Log +from cvs2svn_lib.changeset import RevisionChangeset +from cvs2svn_lib.changeset import OrderedChangeset +from cvs2svn_lib.changeset import BranchChangeset +from cvs2svn_lib.changeset import TagChangeset + + +class CycleInGraphException(Exception): + def __init__(self, cycle): + Exception.__init__( + self, + 'Cycle found in graph: %s' + % ' -> '.join(map(str, cycle + [cycle[0]]))) + + +class NoPredNodeInGraphException(Exception): + def __init__(self, node): + Exception.__init__(self, 'Node %s has no predecessors' % (node,)) + + +class ChangesetGraph(object): + """A graph of changesets and their dependencies.""" + + def __init__(self, changeset_db, cvs_item_to_changeset_id): + self._changeset_db = changeset_db + self._cvs_item_to_changeset_id = cvs_item_to_changeset_id + # A map { id : ChangesetGraphNode } + self.nodes = {} + + def close(self): + self._cvs_item_to_changeset_id.close() + self._cvs_item_to_changeset_id = None + self._changeset_db.close() + self._changeset_db = None + + def add_changeset(self, changeset): + """Add CHANGESET to this graph. + + Determine and record any dependencies to changesets that are + already in the graph. This method does not affect the databases.""" + + node = changeset.create_graph_node(self._cvs_item_to_changeset_id) + + # Now tie the node into our graph. If a changeset referenced by + # node is already in our graph, then add the backwards connection + # from the other node to the new one. If not, then delete the + # changeset from node. + + for pred_id in list(node.pred_ids): + pred_node = self.nodes.get(pred_id) + if pred_node is not None: + pred_node.succ_ids.add(node.id) + else: + node.pred_ids.remove(pred_id) + + for succ_id in list(node.succ_ids): + succ_node = self.nodes.get(succ_id) + if succ_node is not None: + succ_node.pred_ids.add(node.id) + else: + node.succ_ids.remove(succ_id) + + self.nodes[node.id] = node + + def store_changeset(self, changeset): + for cvs_item_id in changeset.cvs_item_ids: + self._cvs_item_to_changeset_id[cvs_item_id] = changeset.id + self._changeset_db.store(changeset) + + def add_new_changeset(self, changeset): + """Add the new CHANGESET to the graph and also to the databases.""" + + if Log().is_on(Log.DEBUG): + Log().debug('Adding changeset %r' % (changeset,)) + + self.add_changeset(changeset) + self.store_changeset(changeset) + + def delete_changeset(self, changeset): + """Remove CHANGESET from the graph and also from the databases. + + In fact, we don't remove CHANGESET from + self._cvs_item_to_changeset_id, because in practice the CVSItems + in CHANGESET are always added again as part of a new CHANGESET, + which will cause the old values to be overwritten.""" + + if Log().is_on(Log.DEBUG): + Log().debug('Removing changeset %r' % (changeset,)) + + del self[changeset.id] + del self._changeset_db[changeset.id] + + def __nonzero__(self): + """Instances are considered True iff they contain any nodes.""" + + return bool(self.nodes) + + def __contains__(self, id): + """Return True if the specified ID is contained in this graph.""" + + return id in self.nodes + + def __getitem__(self, id): + return self.nodes[id] + + def get(self, id): + return self.nodes.get(id) + + def __delitem__(self, id): + """Remove the node corresponding to ID. + + Also remove references to it from other nodes. This method does + not change pred_ids or succ_ids of the node being deleted, nor + does it affect the databases.""" + + node = self[id] + + for succ_id in node.succ_ids: + succ = self[succ_id] + succ.pred_ids.remove(node.id) + + for pred_id in node.pred_ids: + pred = self[pred_id] + pred.succ_ids.remove(node.id) + + del self.nodes[node.id] + + def keys(self): + return self.nodes.keys() + + def __iter__(self): + return self.nodes.itervalues() + + def _get_path(self, reachable_changesets, starting_node_id, ending_node_id): + """Return the shortest path from ENDING_NODE_ID to STARTING_NODE_ID. + + Find a path from ENDING_NODE_ID to STARTING_NODE_ID in + REACHABLE_CHANGESETS, where STARTING_NODE_ID is the id of a + changeset that depends on the changeset with ENDING_NODE_ID. (See + the comment in search_for_path() for a description of the format + of REACHABLE_CHANGESETS.) + + Return a list of changesets, where the 0th one has ENDING_NODE_ID + and the last one has STARTING_NODE_ID. If there is no such path + described in in REACHABLE_CHANGESETS, return None.""" + + if ending_node_id not in reachable_changesets: + return None + + path = [self._changeset_db[ending_node_id]] + id = reachable_changesets[ending_node_id][1] + while id != starting_node_id: + path.append(self._changeset_db[id]) + id = reachable_changesets[id][1] + path.append(self._changeset_db[starting_node_id]) + return path + + def search_for_path(self, starting_node_id, stop_set): + """Search for paths to prerequisites of STARTING_NODE_ID. + + Try to find the shortest dependency path that causes the changeset + with STARTING_NODE_ID to depend (directly or indirectly) on one of + the changesets whose ids are contained in STOP_SET. + + We consider direct and indirect dependencies in the sense that the + changeset can be reached by following a chain of predecessor nodes. + + When one of the changeset_ids in STOP_SET is found, terminate the + search and return the path from that changeset_id to + STARTING_NODE_ID. If no path is found to a node in STOP_SET, + return None.""" + + # A map {node_id : (steps, next_node_id)} where NODE_ID can be + # reached from STARTING_NODE_ID in STEPS steps, and NEXT_NODE_ID + # is the id of the previous node in the path. STARTING_NODE_ID is + # only included as a key if there is a loop leading back to it. + reachable_changesets = {} + + # A list of (node_id, steps) that still have to be investigated, + # and STEPS is the number of steps to get to NODE_ID. + open_nodes = [(starting_node_id, 0)] + # A breadth-first search: + while open_nodes: + (id, steps) = open_nodes.pop(0) + steps += 1 + node = self[id] + for pred_id in node.pred_ids: + # Since the search is breadth-first, we only have to set steps + # that don't already exist. + if pred_id not in reachable_changesets: + reachable_changesets[pred_id] = (steps, id) + open_nodes.append((pred_id, steps)) + + # See if we can stop now: + if pred_id in stop_set: + return self._get_path( + reachable_changesets, starting_node_id, pred_id + ) + + return None + + def consume_nopred_nodes(self): + """Remove and yield changesets in dependency order. + + Each iteration, this generator yields a (changeset_id, time_range) + tuple for the oldest changeset in the graph that doesn't have any + predecessor nodes (i.e., it is ready to be committed). This is + continued until there are no more nodes without predecessors + (either because the graph has been emptied, or because of cycles + in the graph). + + Among the changesets that are ready to be processed, the earliest + one (according to the sorting of the TimeRange class) is yielded + each time. (This is the order in which the changesets should be + committed.) + + The graph should not be otherwise altered while this generator is + running.""" + + def compare((node_1, changeset_1), (node_2, changeset_2)): + """Define an ordering on nopred_nodes elements.""" + + return cmp(node_1.time_range, node_2.time_range) \ + or cmp(changeset_1, changeset_2) + + # Find a list of (node,changeset,) where the node has no + # predecessors: + nopred_nodes = [ + (node, self._changeset_db[node.id],) + for node in self.nodes.itervalues() + if not node.pred_ids] + nopred_nodes.sort(compare) + while nopred_nodes: + (node, changeset,) = nopred_nodes.pop(0) + del self[node.id] + # See if any successors are now ready for extraction: + new_nodes_found = False + for succ_id in node.succ_ids: + succ = self[succ_id] + if not succ.pred_ids: + nopred_nodes.append( (succ, self._changeset_db[succ.id],) ) + new_nodes_found = True + if new_nodes_found: + # All this repeated sorting is very wasteful. We should + # instead use a heap to keep things coming out in order. But + # I highly doubt that this will be a bottleneck, so here we + # go. + nopred_nodes.sort(compare) + yield (node.id, node.time_range) + + def find_cycle(self, starting_node_id): + """Find a cycle in the dependency graph and return it. + + Use STARTING_NODE_ID as the place to start looking. This routine + must only be called after all nopred_nodes have been removed. + Return the list of changesets that are involved in the cycle + (ordered such that cycle[n-1] is a predecessor of cycle[n] and + cycle[-1] is a predecessor of cycle[0]).""" + + # Since there are no nopred nodes in the graph, all nodes in the + # graph must either be involved in a cycle or depend (directly or + # indirectly) on nodes that are in a cycle. + + # Pick an arbitrary node: + node = self[starting_node_id] + + seen_nodes = [node] + + # Follow it backwards until a node is seen a second time; then we + # have our cycle. + while True: + # Pick an arbitrary predecessor of node. It must exist, because + # there are no nopred nodes: + try: + node_id = node.pred_ids.__iter__().next() + except StopIteration: + raise NoPredNodeInGraphException(node) + node = self[node_id] + try: + i = seen_nodes.index(node) + except ValueError: + seen_nodes.append(node) + else: + seen_nodes = seen_nodes[i:] + seen_nodes.reverse() + return [self._changeset_db[node.id] for node in seen_nodes] + + def consume_graph(self, cycle_breaker=None): + """Remove and yield changesets from this graph in dependency order. + + Each iteration, this generator yields a (changeset_id, time_range) + tuple for the oldest changeset in the graph that doesn't have any + predecessor nodes. If CYCLE_BREAKER is specified, then call + CYCLE_BREAKER(cycle) whenever a cycle is encountered, where cycle + is the list of changesets that are involved in the cycle (ordered + such that cycle[n-1] is a predecessor of cycle[n] and cycle[-1] is + a predecessor of cycle[0]). CYCLE_BREAKER should break the cycle + in place then return. + + If a cycle is found and CYCLE_BREAKER was not specified, raise + CycleInGraphException.""" + + while True: + for (changeset_id, time_range) in self.consume_nopred_nodes(): + yield (changeset_id, time_range) + + # If there are any nodes left in the graph, then there must be + # at least one cycle. Find a cycle and process it. + + # This might raise StopIteration, but that indicates that the + # graph has been fully consumed, so we just let the exception + # escape. + start_node_id = self.nodes.iterkeys().next() + + cycle = self.find_cycle(start_node_id) + + if cycle_breaker is not None: + cycle_breaker(cycle) + else: + raise CycleInGraphException(cycle) + + def __repr__(self): + """For convenience only. The format is subject to change at any time.""" + + if self.nodes: + return 'ChangesetGraph:\n%s' \ + % ''.join([' %r\n' % node for node in self]) + else: + return 'ChangesetGraph:\n EMPTY\n' + + node_colors = { + RevisionChangeset : 'lightgreen', + OrderedChangeset : 'cyan', + BranchChangeset : 'orange', + TagChangeset : 'yellow', + } + + def output_coarse_dot(self, f): + """Output the graph in DOT format to file-like object f. + + Such a file can be rendered into a visual representation of the + graph using tools like graphviz. Include only changesets in the + graph, and the dependencies between changesets.""" + + f.write('digraph G {\n') + for node in self: + f.write( + ' C%x [style=filled, fillcolor=%s];\n' % ( + node.id, + self.node_colors[self._changeset_db[node.id].__class__], + ) + ) + f.write('\n') + + for node in self: + for succ_id in node.succ_ids: + f.write(' C%x -> C%x\n' % (node.id, succ_id,)) + f.write('\n') + + f.write('}\n') + + def output_fine_dot(self, f): + """Output the graph in DOT format to file-like object f. + + Such a file can be rendered into a visual representation of the + graph using tools like graphviz. Include all CVSItems and the + CVSItem-CVSItem dependencies in the graph. Group the CVSItems + into clusters by changeset.""" + + f.write('digraph G {\n') + for node in self: + f.write(' subgraph cluster_%x {\n' % (node.id,)) + f.write(' label = "C%x";\n' % (node.id,)) + changeset = self._changeset_db[node.id] + for item_id in changeset.cvs_item_ids: + f.write(' I%x;\n' % (item_id,)) + f.write(' style=filled;\n') + f.write( + ' fillcolor=%s;\n' + % (self.node_colors[self._changeset_db[node.id].__class__],)) + f.write(' }\n\n') + + for node in self: + changeset = self._changeset_db[node.id] + for cvs_item in changeset.get_cvs_items(): + for succ_id in cvs_item.get_succ_ids(): + f.write(' I%x -> I%x;\n' % (cvs_item.id, succ_id,)) + + f.write('\n') + + f.write('}\n') + + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/changeset_graph_link.py cvs2svn-2.0.0/cvs2svn_lib/changeset_graph_link.py --- cvs2svn-1.5.x/cvs2svn_lib/changeset_graph_link.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/changeset_graph_link.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,148 @@ +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2006-2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""Keep track of counts of different types of changeset links.""" + + +from __future__ import generators + +from cvs2svn_lib.boolean import * + + +# A cvs_item doesn't depend on any cvs_items in either pred or succ: +LINK_NONE = 0 + +# A cvs_item depends on one or more cvs_items in pred but none in succ: +LINK_PRED = 1 + +# A cvs_item depends on one or more cvs_items in succ but none in pred: +LINK_SUCC = 2 + +# A cvs_item depends on one or more cvs_items in both pred and succ: +LINK_PASSTHRU = LINK_PRED | LINK_SUCC + + +def _get_link_type(pred, cvs_item, succ): + """Return the type of links from CVS_ITEM to changesets PRED and SUCC. + + The return value is one of LINK_NONE, LINK_PRED, LINK_SUCC, or + LINK_PASSTHRU.""" + + retval = LINK_NONE + if cvs_item.get_pred_ids() & pred.cvs_item_ids: + retval |= LINK_PRED + if cvs_item.get_succ_ids() & succ.cvs_item_ids: + retval |= LINK_SUCC + return retval + + +class ChangesetGraphLink(object): + def __init__(self, pred, changeset, succ): + """Represent a link in a loop in a changeset graph. + + This is the link that goes from PRED -> CHANGESET -> SUCC. + + We are mainly concerned with how many CVSItems have LINK_PRED, + LINK_SUCC, and LINK_PASSTHRU type links to the neighboring + commitsets. If necessary, this class can also break up CHANGESET + into multiple changesets.""" + + self.pred = pred + self.changeset = changeset + self.succ = succ + + # A count of each type of link for cvs_items in changeset + # (indexed by LINK_* constants): + link_counts = [0] * 4 + + for cvs_item in list(changeset.get_cvs_items()): + link_counts[_get_link_type(self.pred, cvs_item, self.succ)] += 1 + + [self.pred_links, self.succ_links, self.passthru_links] = link_counts[1:] + + def get_links_to_move(self): + """Return the number of items that would be moved to split changeset.""" + + return min(self.pred_links, self.succ_links) \ + or max(self.pred_links, self.succ_links) + + def is_breakable(self): + """Return True iff breaking the changeset will do any good.""" + + return self.pred_links != 0 or self.succ_links != 0 + + def __cmp__(self, other): + """Compare SELF with OTHER in terms of which would be better to break. + + The one that is better to break is considered the lesser.""" + + return ( + - cmp(int(self.is_breakable()), int(other.is_breakable())) + or cmp(self.passthru_links, other.passthru_links) + or cmp(self.get_links_to_move(), other.get_links_to_move()) + ) + + def break_changeset(self, changeset_key_generator): + """Break up self.changeset and return the fragments. + + Break it up in such a way that the link is weakened as efficiently + as possible.""" + + if not self.is_breakable(): + raise ValueError('Changeset is not breakable: %r' % self.changeset) + + pred_items = [] + succ_items = [] + + # For each link type, should such CVSItems be moved to the + # changeset containing the predecessor items or the one containing + # the successor items? + destination = { + LINK_PRED : pred_items, + LINK_SUCC : succ_items, + } + + if self.pred_links == 0: + destination[LINK_NONE] = pred_items + destination[LINK_PASSTHRU] = pred_items + elif self.succ_links == 0: + destination[LINK_NONE] = succ_items + destination[LINK_PASSTHRU] = succ_items + elif self.pred_links < self.succ_links: + destination[LINK_NONE] = succ_items + destination[LINK_PASSTHRU] = succ_items + else: + destination[LINK_NONE] = pred_items + destination[LINK_PASSTHRU] = pred_items + + for cvs_item in self.changeset.get_cvs_items(): + link_type = _get_link_type(self.pred, cvs_item, self.succ) + destination[link_type].append(cvs_item.id) + + # Create new changesets of the same type as the old one: + return [ + self.changeset.create_split_changeset( + changeset_key_generator.gen_id(), pred_items), + self.changeset.create_split_changeset( + changeset_key_generator.gen_id(), succ_items), + ] + + def __str__(self): + return 'Link<%x>(%d, %d, %d)' % ( + self.changeset.id, + self.pred_links, self.succ_links, self.passthru_links) + + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/changeset_graph_node.py cvs2svn-2.0.0/cvs2svn_lib/changeset_graph_node.py --- cvs2svn-1.5.x/cvs2svn_lib/changeset_graph_node.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/changeset_graph_node.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,58 @@ +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2006-2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""A node in the changeset dependency graph.""" + + +from __future__ import generators + +from cvs2svn_lib.boolean import * +from cvs2svn_lib.set_support import * +from cvs2svn_lib.context import Ctx +from cvs2svn_lib.time_range import TimeRange + + +class ChangesetGraphNode(object): + """A node in the changeset dependency graph.""" + + __slots__ = ['id', 'time_range', 'pred_ids', 'succ_ids'] + + def __init__(self, changeset, time_range, pred_ids, succ_ids): + # The id of the ChangesetGraphNode is the same as the id of the + # changeset. + self.id = changeset.id + + # The range of times of CVSItems within this Changeset. + self.time_range = time_range + + # The set of changeset ids of changesets that are direct + # predecessors of this one. + self.pred_ids = pred_ids + + # The set of changeset ids of changesets that are direct + # successors of this one. + self.succ_ids = succ_ids + + def __repr__(self): + """For convenience only. The format is subject to change at any time.""" + + return '%x; pred=[%s]; succ=[%s]' % ( + self.id, + ','.join(['%x' % id for id in self.pred_ids]), + ','.join(['%x' % id for id in self.succ_ids]), + ) + + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/check_dependencies_pass.py cvs2svn-2.0.0/cvs2svn_lib/check_dependencies_pass.py --- cvs2svn-1.5.x/cvs2svn_lib/check_dependencies_pass.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/check_dependencies_pass.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,140 @@ +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2000-2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""This module defines some passes that can be used for debugging cv2svn.""" + + +from __future__ import generators + +from cvs2svn_lib.boolean import * +from cvs2svn_lib.set_support import * +from cvs2svn_lib import config +from cvs2svn_lib.context import Ctx +from cvs2svn_lib.common import FatalException +from cvs2svn_lib.common import DB_OPEN_READ +from cvs2svn_lib.log import Log +from cvs2svn_lib.pass_manager import Pass +from cvs2svn_lib.artifact_manager import artifact_manager +from cvs2svn_lib.cvs_file_database import CVSFileDatabase +from cvs2svn_lib.symbol_database import SymbolDatabase +from cvs2svn_lib.cvs_item_database import OldCVSItemStore +from cvs2svn_lib.cvs_item_database import IndexedCVSItemStore + + +class CheckDependenciesPass(Pass): + """Check that the dependencies are self-consistent.""" + + def __init__(self): + Pass.__init__(self) + + def register_artifacts(self): + self._register_temp_file_needed(config.SYMBOL_DB) + self._register_temp_file_needed(config.CVS_FILES_DB) + + def iter_cvs_items(self): + raise NotImplementedError() + + def get_cvs_item(self, item_id): + raise NotImplementedError() + + def run(self, stats_keeper): + Ctx()._cvs_file_db = CVSFileDatabase(DB_OPEN_READ) + self.symbol_db = SymbolDatabase() + Ctx()._symbol_db = self.symbol_db + + Log().quiet("Checking dependency consistency...") + + fatal_errors = [] + for cvs_item in self.iter_cvs_items(): + # Check that the pred_ids and succ_ids are mutually consistent: + for pred_id in cvs_item.get_pred_ids(): + pred = self.get_cvs_item(pred_id) + if not cvs_item.id in pred.get_succ_ids(): + fatal_errors.append( + '%s lists pred=%s, but not vice versa.' % (cvs_item, pred,)) + + for succ_id in cvs_item.get_succ_ids(): + succ = self.get_cvs_item(succ_id) + if not cvs_item.id in succ.get_pred_ids(): + fatal_errors.append( + '%s lists succ=%s, but not vice versa.' % (cvs_item, succ,)) + + if fatal_errors: + raise FatalException("Dependencies inconsistent:\n" + + "\n".join(fatal_errors) + "\n" + + "Exited due to fatal error(s).\n") + + self.symbol_db.close() + self.symbol_db = None + Ctx()._cvs_file_db.close() + Log().quiet("Done") + + +class CheckItemStoreDependenciesPass(CheckDependenciesPass): + def __init__(self, cvs_items_store_file): + CheckDependenciesPass.__init__(self) + self.cvs_items_store_file = cvs_items_store_file + + def register_artifacts(self): + CheckDependenciesPass.register_artifacts(self) + self._register_temp_file_needed(self.cvs_items_store_file) + + def iter_cvs_items(self): + cvs_item_store = OldCVSItemStore( + artifact_manager.get_temp_file(self.cvs_items_store_file)) + + for cvs_file_items in cvs_item_store.iter_cvs_file_items(): + self.current_cvs_file_items = cvs_file_items + for cvs_item in cvs_file_items.values(): + yield cvs_item + + del self.current_cvs_file_items + + cvs_item_store.close() + + def get_cvs_item(self, item_id): + return self.current_cvs_file_items[item_id] + + +class CheckIndexedItemStoreDependenciesPass(CheckDependenciesPass): + def __init__(self, cvs_items_store_file, cvs_items_store_index_file): + CheckDependenciesPass.__init__(self) + self.cvs_items_store_file = cvs_items_store_file + self.cvs_items_store_index_file = cvs_items_store_index_file + + def register_artifacts(self): + CheckDependenciesPass.register_artifacts(self) + self._register_temp_file_needed(self.cvs_items_store_file) + self._register_temp_file_needed(self.cvs_items_store_index_file) + + def iter_cvs_items(self): + return self.cvs_item_store.itervalues() + + def get_cvs_item(self, item_id): + return self.cvs_item_store[item_id] + + def run(self, stats_keeper): + self.cvs_item_store = IndexedCVSItemStore( + artifact_manager.get_temp_file(self.cvs_items_store_file), + artifact_manager.get_temp_file(self.cvs_items_store_index_file), + DB_OPEN_READ) + + CheckDependenciesPass.run(self, stats_keeper) + + self.cvs_item_store.close() + self.cvs_item_store = None + + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/checkout_internal.py cvs2svn-2.0.0/cvs2svn_lib/checkout_internal.py --- cvs2svn-1.5.x/cvs2svn_lib/checkout_internal.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/checkout_internal.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,688 @@ +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""This module contains classes that implement the --use-internal-co option. + +The idea is to patch up the revisions' contents incrementally, thus +avoiding the huge number of process spawns and the O(n^2) overhead of +using 'co' and 'cvs'. + +InternalRevisionRecorder saves the RCS deltas and RCS revision trees +to databases. Notably, deltas from the trunk need to be reversed, as +CVS stores them so they apply from HEAD backwards. + +InternalRevisionExcluder copies the revision trees to a new database, +omitting excluded branches. + +InternalRevisionReader produces the revisions' contents on demand. To +generate the text for a typical revision, we need the revision's delta +text plus the fulltext of the previous revision. Therefore, we +maintain a checkout database containing a copy of the fulltext of any +revision for which subsequent revisions still need to be retrieved. +It is crucial to remove text from this database as soon as it is no +longer needed, to prevent it from growing enormous. + +There are two reasons that the text from a revision can be needed: (1) +because the revision itself still needs to be output to a dumpfile; +(2) because another revision needs it as the base of its delta. We +maintain a reference count for each revision, which includes *both* +possibilities. The first time a revision's text is needed, it is +generated by applying the revision's deltatext to the previous +revision's fulltext, and the resulting fulltext is stored in the +checkout database. Each time a revision's fulltext is retrieved, its +reference count is decremented. When the reference count goes to +zero, then the fulltext is deleted from the checkout database. + +The administrative data for managing this consists of one TextRecord +entry for each revision. Each TextRecord has an id, which is the same +id as used for the corresponding CVSRevision instance. It also +maintains a count of the times it is expected to be retrieved. +TextRecords come in several varieties: + +FullTextRecord -- Used for revisions whose fulltext is contained + directly in the RCS file, and therefore available during + CollectRevsPass (i.e., typically revision 1.1 of each file). + +DeltaTextRecord -- Used for revisions that are defined via a delta + relative to some other TextRecord. These records record the id of + the TextRecord that holds the base text against which the delta is + defined. When the text for a DeltaTextRecord is retrieved, the + DeltaTextRecord instance is deleted and a CheckedOutTextRecord + instance is created to take its place. + +CheckedOutTextRecord -- Used during OutputPass for a revision that + started out as a DeltaTextRecord, but has already been retrieved + (and therefore its fulltext is stored in the checkout database). + +While a file is being processed during CollectRevsPass, the fulltext +and deltas are stored to the delta database, and TextRecord instances +are created to keep track of things. The reference counts are all +initialized to zero. + +After CollectRevsPass has done any preliminary tree mangling, its +_FileDataCollector.parse_completed(), method calls +RevisionRecorder.finish_file(), passing it the CVSFileItems instance +that describes the revisions in the file. At this point the reference +counts for the file's TextRecords are updated: each record referred to +by a delta has its refcount incremented, and each record that +corresponds to a non-delete CVSRevision is incremented. After that, +any records with refcount==0 are removed. When one record is removed, +that can cause another record's reference count to go to zero and be +removed too, recursively. When a TextRecord is deleted at this stage, +its deltatext is also deleted from the delta database. + +In FilterSymbolsPass, the exact same procedure (described in the +previous paragraph) is repeated, but this time using the CVSFileItems +after it has been updated for excluded symbols, symbol +preferred-parent grafting, etc.""" + + +from __future__ import generators + +import cStringIO +import re +import types + +from cvs2svn_lib.set_support import * +from cvs2svn_lib import config +from cvs2svn_lib.common import DB_OPEN_NEW +from cvs2svn_lib.common import DB_OPEN_READ +from cvs2svn_lib.common import warning_prefix +from cvs2svn_lib.common import InternalError +from cvs2svn_lib.log import Log +from cvs2svn_lib.artifact_manager import artifact_manager +from cvs2svn_lib.symbol import Symbol +from cvs2svn_lib.cvs_item import CVSRevisionModification +from cvs2svn_lib.cvs_item import CVSRevisionDelete +from cvs2svn_lib.collect_data import is_trunk_revision +from cvs2svn_lib.database import Database +from cvs2svn_lib.database import IndexedDatabase +from cvs2svn_lib.rcs_stream import RCSStream +from cvs2svn_lib.revision_recorder import RevisionRecorder +from cvs2svn_lib.revision_excluder import RevisionExcluder +from cvs2svn_lib.revision_reader import RevisionReader +from cvs2svn_lib.serializer import StringSerializer +from cvs2svn_lib.serializer import CompressingSerializer +from cvs2svn_lib.serializer import PrimedPickleSerializer + + +class TextRecord(object): + """Bookkeeping data for the text of a single CVSRevision.""" + + __slots__ = ['id', 'refcount'] + + def __init__(self, id): + # The cvs_rev_id of the revision whose text this is. + self.id = id + + # The number of times that the text of this revision will be + # retrieved. + self.refcount = 0 + + def __getstate__(self): + return (self.id, self.refcount,) + + def __setstate__(self, state): + (self.id, self.refcount,) = state + + def increment_dependency_refcounts(self, text_record_db): + """Increment the refcounts of any records that this one depends on.""" + + pass + + def decrement_refcount(self, text_record_db): + """Decrement the number of times our text still has to be checked out. + + If the reference count goes to zero, call discard().""" + + self.refcount -= 1 + if self.refcount == 0: + text_record_db.discard(self.id) + + def checkout(self, text_record_db): + """Workhorse of the checkout process. + + Return the text for this revision, decrement our reference count, + and update the databases depending on whether there will be future + checkouts.""" + + raise NotImplementedError() + + def free(self, text_record_db): + """This instance will never again be checked out; free it. + + Also free any associated resources and decrement the refcounts of + any other TextRecords that this one depends on.""" + + raise NotImplementedError() + + +class FullTextRecord(TextRecord): + __slots__ = [] + + def __getstate__(self): + return (self.id, self.refcount,) + + def __setstate__(self, state): + (self.id, self.refcount,) = state + + def checkout(self, text_record_db): + text = text_record_db.delta_db[self.id] + self.decrement_refcount(text_record_db) + return text + + def free(self, text_record_db): + del text_record_db.delta_db[self.id] + + def __str__(self): + return 'FullTextRecord(%x, %d)' % (self.id, self.refcount,) + + +class DeltaTextRecord(TextRecord): + __slots__ = ['pred_id'] + + def __init__(self, id, pred_id): + TextRecord.__init__(self, id) + + # The cvs_rev_id of the revision relative to which this delta is + # defined. + self.pred_id = pred_id + + def __getstate__(self): + return (self.id, self.refcount, self.pred_id,) + + def __setstate__(self, state): + (self.id, self.refcount, self.pred_id,) = state + + def increment_dependency_refcounts(self, text_record_db): + text_record_db[self.pred_id].refcount += 1 + + def checkout(self, text_record_db): + base_text = text_record_db[self.pred_id].checkout(text_record_db) + co = RCSStream(base_text) + delta_text = text_record_db.delta_db[self.id] + co.apply_diff(delta_text) + text = co.get_text() + del co + self.refcount -= 1 + if self.refcount == 0: + # This text will never be needed again; just delete ourselves + # without ever having stored the fulltext to the checkout + # database: + del text_record_db[self.id] + else: + # Store a new CheckedOutTextRecord in place of ourselves: + text_record_db.checkout_db['%x' % self.id] = text + new_text_record = CheckedOutTextRecord(self.id) + new_text_record.refcount = self.refcount + text_record_db.replace(new_text_record) + return text + + def free(self, text_record_db): + del text_record_db.delta_db[self.id] + text_record_db[self.pred_id].decrement_refcount(text_record_db) + + def __str__(self): + return 'DeltaTextRecord(%x -> %x, %d)' \ + % (self.pred_id, self.id, self.refcount,) + + +class CheckedOutTextRecord(TextRecord): + __slots__ = [] + + def __getstate__(self): + return (self.id, self.refcount,) + + def __setstate__(self, state): + (self.id, self.refcount,) = state + + def checkout(self, text_record_db): + text = text_record_db.checkout_db['%x' % self.id] + self.decrement_refcount(text_record_db) + return text + + def free(self, text_record_db): + del text_record_db.checkout_db['%x' % self.id] + + def __str__(self): + return 'CheckedOutTextRecord(%x, %d)' % (self.id, self.refcount,) + + +class NullDatabase(object): + """A do-nothing database that can be used with TextRecordDatabase. + + Use this when you don't actually want to allow anything to be + deleted.""" + + def __delitem__(self, id): + pass + + +class TextRecordDatabase: + """Holds the TextRecord instances that are currently live. + + During CollectRevsPass and FilterSymbolsPass, files are processed + one by one and a new TextRecordDatabase instance is used for each + file. During OutputPass, a single TextRecordDatabase instance is + used for the duration of OutputPass; individual records are added + and removed when they are active.""" + + def __init__(self, delta_db, checkout_db): + # A map { cvs_rev_id -> TextRecord }. + self.text_records = {} + + # A database-like object using cvs_rev_ids as keys and containing + # fulltext/deltatext strings as values. Its __getitem__() method + # is used to retrieve deltas when they are needed, and its + # __delitem__() method is used to delete deltas when they can be + # freed. The modifiability of the delta database varies from pass + # to pass, so the object stored here varies as well: + # + # CollectRevsPass: a fully-functional IndexedDatabase. This + # allows deltas that will not be needed to be deleted. + # + # FilterSymbolsPass: a NullDatabase. The delta database cannot be + # modified during this pass, and we have no need to retrieve + # deltas, so we just use a dummy object here. + # + # OutputPass: a disabled IndexedDatabase. During this pass we + # need to retrieve deltas, but we are not allowed to modify the + # delta database. So we use an IndexedDatabase whose __del__() + # method has been disabled to do nothing. + self.delta_db = delta_db + + # A database-like object using cvs_rev_ids as keys and containing + # fulltext strings as values. This database is only set during + # OutputPass. + self.checkout_db = checkout_db + + # If this is set to a list, then the list holds the ids of + # text_records that have to be deleted; when discard() is called, + # it adds the requested id to the list but does not delete it. If + # this member is set to None, then text_records are deleted + # immediately when discard() is called. + self.deferred_deletes = None + + def __getstate__(self): + return self.text_records.values() + + def __setstate__(self, state): + self.text_records = {} + for text_record in state: + self.add(text_record) + self.delta_db = NullDatabase() + self.checkout_db = NullDatabase() + self.deferred_deletes = None + + def add(self, text_record): + """Add TEXT_RECORD to our database. + + There must not already be a record with the same id.""" + + assert not self.text_records.has_key(text_record.id) + + self.text_records[text_record.id] = text_record + + def __getitem__(self, id): + return self.text_records[id] + + def __delitem__(self, id): + """Free the record with the specified ID.""" + + del self.text_records[id] + + def replace(self, text_record): + """Store TEXT_RECORD in place of the existing record with the same id. + + Do not do anything with the old record.""" + + assert self.text_records.has_key(text_record.id) + self.text_records[text_record.id] = text_record + + def discard(self, *ids): + """The text records with IDS are no longer needed; discard them. + + This involves calling their free() methods and also removing them + from SELF. + + If SELF.deferred_deletes is not None, then the ids to be deleted + are added to the list instead of deleted immediately. This + mechanism is to prevent a stack overflow from the avalanche of + deletes that can result from deleting a long chain of revisions.""" + + if self.deferred_deletes is None: + # This is an outer-level delete. + self.deferred_deletes = list(ids) + while self.deferred_deletes: + id = self.deferred_deletes.pop() + text_record = self[id] + if text_record.refcount != 0: + raise InternalError( + 'TextRecordDatabase.discard(%s) called with refcount = %d' + % (text_record, text_record.refcount,) + ) + # This call might cause other text_record ids to be added to + # self.deferred_deletes: + text_record.free(self) + del self[id] + self.deferred_deletes = None + else: + self.deferred_deletes.extend(ids) + + def itervalues(self): + return self.text_records.itervalues() + + def recompute_refcounts(self, cvs_file_items): + """Recompute the refcounts of the contained TextRecords. + + Use CVS_FILE_ITEMS to determine which records will be needed by + cvs2svn.""" + + # First clear all of the refcounts: + for text_record in self.itervalues(): + text_record.refcount = 0 + + # Now increment the reference count of records that are needed as + # the source of another record's deltas: + for text_record in self.itervalues(): + text_record.increment_dependency_refcounts(self.text_records) + + # Now increment the reference count of records that will be needed + # by cvs2svn: + for lod_items in cvs_file_items.iter_lods(): + for cvs_rev in lod_items.cvs_revisions: + if isinstance(cvs_rev, CVSRevisionModification): + self[cvs_rev.id].refcount += 1 + + def free_unused(self): + """Free any TextRecords whose reference counts are zero.""" + + # The deletion of some of these text records might cause others to + # be unused, in which case they will be deleted automatically. + # But since the initially-unused records are not referred to by + # any others, we don't have to be afraid that they will be deleted + # before we get to them. But it *is* crucial that we create the + # whole unused list before starting the loop. + + unused = [ + text_record.id + for text_record in self.itervalues() + if text_record.refcount == 0 + ] + + self.discard(*unused) + + def log_leftovers(self): + """If any TextRecords still exist, log them.""" + + if self.text_records: + Log().warn( + "%s: internal problem: leftover revisions in the checkout cache:" + % warning_prefix) + for text_record in self.itervalues(): + Log().warn(' %s' % (text_record,)) + + def __repr__(self): + """Debugging output of the current contents of the TextRecordDatabase.""" + + retval = ['TextRecordDatabase:'] + for text_record in self.itervalues(): + retval.append(' %s' % (text_record,)) + return '\n'.join(retval) + + +class InternalRevisionRecorder(RevisionRecorder): + """A RevisionRecorder that reconstructs the fulltext internally.""" + + def __init__(self, compress): + self._compress = compress + + def register_artifacts(self, which_pass): + which_pass._register_temp_file(config.RCS_DELTAS_INDEX_TABLE) + which_pass._register_temp_file(config.RCS_DELTAS_STORE) + which_pass._register_temp_file(config.RCS_TREES_INDEX_TABLE) + which_pass._register_temp_file(config.RCS_TREES_STORE) + + def start(self): + ser = StringSerializer() + if self._compress: + ser = CompressingSerializer(ser) + self._rcs_deltas = IndexedDatabase( + artifact_manager.get_temp_file(config.RCS_DELTAS_STORE), + artifact_manager.get_temp_file(config.RCS_DELTAS_INDEX_TABLE), + DB_OPEN_NEW, ser) + primer = (FullTextRecord, DeltaTextRecord) + self._rcs_trees = IndexedDatabase( + artifact_manager.get_temp_file(config.RCS_TREES_STORE), + artifact_manager.get_temp_file(config.RCS_TREES_INDEX_TABLE), + DB_OPEN_NEW, PrimedPickleSerializer(primer)) + + def start_file(self, cvs_file): + self._cvs_file = cvs_file + + # A map from cvs_rev_id to TextRecord instance: + self.text_record_db = TextRecordDatabase(self._rcs_deltas, NullDatabase()) + + def record_text(self, revisions_data, revision, log, text): + revision_data = revisions_data[revision] + if is_trunk_revision(revision): + # On trunk, revisions are encountered in reverse order (1.<N> + # ... 1.1) and deltas are inverted. The first text that we see + # is the fulltext for the HEAD revision. After that, the text + # corresponding to revision 1.N is the delta (1.<N+1> -> + # 1.<N>)). We have to invert the deltas here so that we can + # read the revisions out in dependency order; that is, for + # revision 1.1 we want the fulltext, and for revision 1.<N> we + # want the delta (1.<N-1> -> 1.<N>). This means that we can't + # compute the delta for a revision until we see its logical + # parent. When we finally see revision 1.1 (which is recognized + # because it doesn't have a parent), we can record the diff (1.1 + # -> 1.2) for revision 1.2, and also the fulltext for 1.1. + + if revision_data.child is None: + # This is HEAD, as fulltext. Initialize the RCSStream so + # that we can compute deltas backwards in time. + self._stream = RCSStream(text) + else: + # Any other trunk revision is a backward delta. Apply the + # delta to the RCSStream to mutate it to the contents of this + # revision, and also to get the reverse delta, which we store + # as the forward delta of our child revision. + text = self._stream.invert_diff(text) + text_record = DeltaTextRecord( + revisions_data[revision_data.child].cvs_rev_id, + revision_data.cvs_rev_id + ) + self._writeout(text_record, text) + + if revision_data.parent is None: + # This is revision 1.1. Write its fulltext: + text_record = FullTextRecord(revision_data.cvs_rev_id) + self._writeout(text_record, self._stream.get_text()) + + # There will be no more trunk revisions delivered, so free the + # RCSStream. + del self._stream + + else: + # On branches, revisions are encountered in logical order + # (<BRANCH>.1 ... <BRANCH>.<N>) and the text corresponding to + # revision <BRANCH>.<N> is the forward delta (<BRANCH>.<N-1> -> + # <BRANCH>.<N>). That's what we need, so just store it. + + # FIXME: It would be nice to avoid writing out branch deltas + # when --trunk-only. (They will be deleted when finish_file() + # is called, but if the delta db is in an IndexedDatabase the + # deletions won't actually recover any disk space.) + text_record = DeltaTextRecord( + revision_data.cvs_rev_id, + revisions_data[revision_data.parent].cvs_rev_id + ) + self._writeout(text_record, text) + + return None + + def _writeout(self, text_record, text): + self.text_record_db.add(text_record) + self._rcs_deltas[text_record.id] = text + + def finish_file(self, cvs_file_items): + """Finish processing of the current file. + + Compute the initial text record refcounts, discard any records + that are unneeded, and store the text records for the file to the + _rcs_trees database.""" + + self.text_record_db.recompute_refcounts(cvs_file_items) + self.text_record_db.free_unused() + self._rcs_trees[self._cvs_file.id] = self.text_record_db + del self._cvs_file + del self.text_record_db + + def finish(self): + self._rcs_deltas.close() + self._rcs_trees.close() + + +class InternalRevisionExcluder(RevisionExcluder): + """The RevisionExcluder used by InternalRevisionReader.""" + + def register_artifacts(self, which_pass): + which_pass._register_temp_file_needed(config.RCS_TREES_STORE) + which_pass._register_temp_file_needed(config.RCS_TREES_INDEX_TABLE) + which_pass._register_temp_file(config.RCS_TREES_FILTERED_STORE) + which_pass._register_temp_file(config.RCS_TREES_FILTERED_INDEX_TABLE) + + def start(self): + self._tree_db = IndexedDatabase( + artifact_manager.get_temp_file(config.RCS_TREES_STORE), + artifact_manager.get_temp_file(config.RCS_TREES_INDEX_TABLE), + DB_OPEN_READ) + primer = (FullTextRecord, DeltaTextRecord) + self._new_tree_db = IndexedDatabase( + artifact_manager.get_temp_file(config.RCS_TREES_FILTERED_STORE), + artifact_manager.get_temp_file(config.RCS_TREES_FILTERED_INDEX_TABLE), + DB_OPEN_NEW, PrimedPickleSerializer(primer)) + + def process_file(self, cvs_file_items): + text_record_db = self._tree_db[cvs_file_items.cvs_file.id] + text_record_db.recompute_refcounts(cvs_file_items) + text_record_db.free_unused() + self._new_tree_db[cvs_file_items.cvs_file.id] = text_record_db + + def skip_file(self, cvs_file): + text_record_db = self._tree_db[cvs_file.id] + self._new_tree_db[cvs_file.id] = text_record_db + + def finish(self): + self._tree_db.close() + self._new_tree_db.close() + + +class InternalRevisionReader(RevisionReader): + """A RevisionReader that reads the contents from an own delta store.""" + + _kw_re = re.compile( + r'\$(' + + r'Author|Date|Header|Id|Name|Locker|Log|RCSfile|Revision|Source|State' + + r'):[^$\n]*\$') + + def __init__(self, compress): + self._compress = compress + + def register_artifacts(self, which_pass): + which_pass._register_temp_file(config.CVS_CHECKOUT_DB) + which_pass._register_temp_file_needed(config.RCS_DELTAS_STORE) + which_pass._register_temp_file_needed(config.RCS_DELTAS_INDEX_TABLE) + which_pass._register_temp_file_needed(config.RCS_TREES_FILTERED_STORE) + which_pass._register_temp_file_needed( + config.RCS_TREES_FILTERED_INDEX_TABLE) + + def get_revision_recorder(self): + return InternalRevisionRecorder(self._compress) + + def get_revision_excluder(self): + return InternalRevisionExcluder() + + def start(self): + self._delta_db = IndexedDatabase( + artifact_manager.get_temp_file(config.RCS_DELTAS_STORE), + artifact_manager.get_temp_file(config.RCS_DELTAS_INDEX_TABLE), + DB_OPEN_READ) + self._delta_db.__delitem__ = lambda id: None + self._tree_db = IndexedDatabase( + artifact_manager.get_temp_file(config.RCS_TREES_FILTERED_STORE), + artifact_manager.get_temp_file(config.RCS_TREES_FILTERED_INDEX_TABLE), + DB_OPEN_READ) + ser = StringSerializer() + if self._compress: + ser = CompressingSerializer(ser) + self._co_db = Database( + artifact_manager.get_temp_file(config.CVS_CHECKOUT_DB), DB_OPEN_NEW, + ser) + + # The set of CVSFile instances whose TextRecords have already been + # read: + self._loaded_files = set() + + # A map { CVSFILE : _FileTree } for files that currently have live + # revisions: + self._text_record_db = TextRecordDatabase(self._delta_db, self._co_db) + + def _get_text_record(self, cvs_rev): + """Return the TextRecord instance for CVS_REV. + + If the TextRecords for CVS_REV.cvs_file haven't been loaded yet, + do so now.""" + + if cvs_rev.cvs_file not in self._loaded_files: + for text_record in self._tree_db[cvs_rev.cvs_file.id].itervalues(): + self._text_record_db.add(text_record) + self._loaded_files.add(cvs_rev.cvs_file) + + return self._text_record_db[cvs_rev.id] + + def get_content_stream(self, cvs_rev, suppress_keyword_substitution=False): + """Check out the text for revision C_REV from the repository. + + Return the text wrapped in a readable file object. If + SUPPRESS_KEYWORD_SUBSTITUTION is True, any RCS keywords will be + _un_expanded prior to returning the file content. Note that $Log$ + never actually generates a log (which makes test 'requires_cvs()' + fail). + + Revisions may be requested in any order, but if they are not + requested in dependency order the checkout database will become + very large. Revisions may be skipped. Each revision may be + requested only once.""" + + text = self._get_text_record(cvs_rev).checkout(self._text_record_db) + if suppress_keyword_substitution: + text = re.sub(self._kw_re, r'$\1$', text) + + return cStringIO.StringIO(text) + + def skip_content(self, cvs_rev): + self._get_text_record(cvs_rev).decrement_refcount(self._text_record_db) + + def finish(self): + self._text_record_db.log_leftovers() + + del self._text_record_db + self._delta_db.close() + self._tree_db.close() + self._co_db.close() + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/collect_data.py cvs2svn-2.0.0/cvs2svn_lib/collect_data.py --- cvs2svn-1.5.x/cvs2svn_lib/collect_data.py 2006-09-25 16:56:22.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/collect_data.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -14,7 +14,38 @@ # history and logs, available at http://cvs2svn.tigris.org/. # ==================================================================== -"""This module contains database facilities used by cvs2svn.""" +"""Data collection classes. + +This module contains the code used to collect data from the CVS +repository. It parses *,v files, recording all useful information +except for the actual file contents (though even the file contents +might be recorded by the RevisionRecorder if one is configured). + +As a *,v file is parsed, the information pertaining to the file is +accumulated in memory, mostly in _RevisionData, _BranchData, and +_TagData objects. When parsing is complete, a final pass is made over +the data to create some final dependency links, collect statistics, +etc., then the _*Data objects are converted into CVSItem objects +(CVSRevision, CVSBranch, and CVSTag respectively) and the CVSItems are +dumped into databases. + +During the data collection, persistent unique ids are allocated to +many types of objects: CVSFile, Symbol, and CVSItems. CVSItems are a +special case. CVSItem ids are unique across all CVSItem types, and +the ids are carried over from the corresponding data collection +objects: + + _RevisionData -> CVSRevision + + _BranchData -> CVSBranch + + _TagData -> CVSTag + +In a later pass it is possible to convert tags <-> branches. But even +if this occurs, the new branch or tag uses the same id as the old tag +or branch. + +""" from __future__ import generators @@ -27,26 +58,23 @@ import time from cvs2svn_lib.boolean import * from cvs2svn_lib.set_support import * from cvs2svn_lib import config +from cvs2svn_lib.common import DB_OPEN_NEW from cvs2svn_lib.common import FatalError from cvs2svn_lib.common import warning_prefix from cvs2svn_lib.common import error_prefix -from cvs2svn_lib.common import OP_ADD -from cvs2svn_lib.common import OP_CHANGE -from cvs2svn_lib.common import OP_DELETE from cvs2svn_lib.log import Log from cvs2svn_lib.context import Ctx from cvs2svn_lib.artifact_manager import artifact_manager +from cvs2svn_lib.project import FileInAndOutOfAtticException from cvs2svn_lib.cvs_file import CVSFile -from cvs2svn_lib.line_of_development import Trunk -from cvs2svn_lib.line_of_development import Branch -from cvs2svn_lib.cvs_item import CVSRevision +from cvs2svn_lib.symbol import Symbol +from cvs2svn_lib.symbol import Trunk +from cvs2svn_lib.cvs_item import CVSBranch +from cvs2svn_lib.cvs_item import CVSTag +from cvs2svn_lib.cvs_item import cvs_revision_type_map +from cvs2svn_lib.cvs_file_items import CVSFileItems from cvs2svn_lib.key_generator import KeyGenerator -from cvs2svn_lib.database import Database -from cvs2svn_lib.database import SDatabase -from cvs2svn_lib.database import DB_OPEN_NEW -from cvs2svn_lib.cvs_file_database import CVSFileDatabase from cvs2svn_lib.cvs_item_database import NewCVSItemStore -from cvs2svn_lib.symbol import Symbol from cvs2svn_lib.symbol_statistics import SymbolStatisticsCollector from cvs2svn_lib.metadata_database import MetadataDatabase @@ -61,14 +89,13 @@ branch_tag_re = re.compile(r''' $ ''', re.VERBOSE) -# This really only matches standard '1.1.1.*'-style vendor revisions. -# One could conceivably have a file whose default branch is 1.1.3 or -# whatever, or was that at some point in time, with vendor revisions -# 1.1.3.1, 1.1.3.2, etc. But with the default branch gone now (which -# is the only time this regexp gets used), we'd have no basis for -# assuming that the non-standard vendor branch had ever been the -# default branch anyway, so we don't want this to match them anyway. -vendor_revision = re.compile(r'^1\.1\.1\.\d+$') + +def rev_tuple(rev): + """Return a tuple of integers corresponding to revision number REV. + + For example, if REV is '1.2.3.4', then return (1,2,3,4).""" + + return tuple([int(x) for x in rev.split('.')]) def is_trunk_revision(rev): @@ -110,13 +137,10 @@ class _RevisionData: def __init__(self, cvs_rev_id, rev, timestamp, author, state): # The id of this revision: self.cvs_rev_id = cvs_rev_id - # The CVSRevision is not yet known. It will be stored here: - self.cvs_rev = None self.rev = rev self.timestamp = timestamp self.author = author self.original_timestamp = timestamp - self._adjusted = False self.state = state # If this is the first revision on a branch, then this is the @@ -124,15 +148,13 @@ class _RevisionData: self.parent_branch_data = None # The revision number of the parent of this revision along the - # same line of development, if any. - # - # For the first revision R on a branch, we consider the revision - # from which R sprouted to be the 'previous'. + # same line of development, if any. For the first revision R on a + # branch, we consider the revision from which R sprouted to be the + # 'parent'. If this is the root revision in the file's revision + # tree, then this field is None. # # Note that this revision can't be determined arithmetically (due - # to cvsadmin -o, which is why this is necessary). - # - # If the key has no previous revision, then this field is None. + # to cvsadmin -o), which is why this field is necessary. self.parent = None # The revision number of the primary child of this revision (the @@ -141,16 +163,20 @@ class _RevisionData: self.child = None # The _BranchData instances of branches that sprout from this - # revision. It would be inconvenient to initialize it here - # because we would have to scan through all branches known by the - # _SymbolDataCollector to find the ones having us as the parent. - # Instead, this information is filled in by - # _FileDataCollector._resolve_dependencies(). + # revision, sorted in ascending order by branch number. It would + # be inconvenient to initialize it here because we would have to + # scan through all branches known by the _SymbolDataCollector to + # find the ones having us as the parent. Instead, this + # information is filled in by + # _FileDataCollector._resolve_dependencies() and sorted by + # _FileDataCollector._sort_branches(). self.branches_data = [] - # The _SymbolData instances of symbols that are closed by this - # revision. - self.closed_symbols_data = [] + # The revision numbers of the first commits on any branches on + # which commits occurred. This dependency is kept explicitly + # because otherwise a revision-only topological sort would miss + # the dependency that exists via branches_data. + self.branches_revs_data = [] # The _TagData instances of tags that are connected to this # revision. @@ -159,55 +185,81 @@ class _RevisionData: # The id of the metadata record associated with this revision. self.metadata_id = None + # True iff this revision was the head revision on a default branch + # at some point (as best we can determine). + self.non_trunk_default_branch_revision = False + + # Iff this is the 1.2 revision at which a non-trunk default branch + # revision was ended, store the number of the last revision on + # the default branch here. + self.default_branch_prev = None + + # Iff this is the last revision of a non-trunk default branch, and + # the branch is followed by a 1.2 revision, then this holds the + # number of the 1.2 revision (namely, '1.2'). + self.default_branch_next = None + # A boolean value indicating whether deltatext was associated with # this revision. self.deltatext_exists = None - def adjust_timestamp(self, timestamp): - self._adjusted = True - self.timestamp = timestamp - - def timestamp_was_adjusted(self): - return self._adjusted + # A token that may be returned from + # RevisionRecorder.record_text(). It can be used by + # RevisionReader to obtain the text again. + self.revision_recorder_token = None - def is_first_on_branch(self): - return not self.parent or self.parent_branch_data is not None + def get_first_on_branch_id(self): + return self.parent_branch_data and self.parent_branch_data.id class _SymbolData: - """Collection area for information about a CVS symbol (branch or tag).""" + """Collection area for information about a symbol in a single CVSFile. + + SYMBOL is an instance of Symbol, undifferentiated as a Branch or a + Tag regardless of whether self is a _BranchData or a _TagData.""" def __init__(self, id, symbol): + """Initialize an object for SYMBOL.""" + + # The unique id that will be used for this particular symbol in + # this particular file. This same id will be used for the CVSItem + # that is derived from this instance. self.id = id + + # An instance of Symbol. self.symbol = symbol class _BranchData(_SymbolData): - """Collection area for information about a CVSBranch.""" + """Collection area for information about a Branch in a single CVSFile.""" def __init__(self, id, symbol, branch_number): _SymbolData.__init__(self, id, symbol) + + # The branch number (e.g., '1.5.2') of this branch. self.branch_number = branch_number # The revision number of the revision from which this branch - # sprouts. + # sprouts (e.g., '1.5'). self.parent = self.branch_number[:self.branch_number.rindex(".")] - # The revision number of the first commit on this branch, if any; - # otherwise, None. + # The revision number of the first commit on this branch, if any + # (e.g., '1.5.2.1'); otherwise, None. self.child = None class _TagData(_SymbolData): - """Collection area for information about a CVSTag.""" + """Collection area for information about a Tag in a single CVSFile.""" def __init__(self, id, symbol, rev): _SymbolData.__init__(self, id, symbol) + + # The revision number being tagged (e.g., '1.5.2.3'). self.rev = rev -class _SymbolDataCollector: - """Collect information about symbols in a CVSFile.""" +class _SymbolDataCollector(object): + """Collect information about symbols in a single CVSFile.""" def __init__(self, fdc, cvs_file): self.fdc = fdc @@ -239,18 +291,20 @@ class _SymbolDataCollector: branch_data = self.branches_data.get(branch_number) if branch_data is not None: - sys.stderr.write("%s: in '%s':\n" + Log().warn( + "%s: in '%s':\n" " branch '%s' already has name '%s',\n" " cannot also have name '%s', ignoring the latter\n" % (warning_prefix, self.cvs_file.filename, branch_number, - branch_data.symbol.name, name)) + branch_data.symbol.name, name) + ) return branch_data symbol = self.pdc.get_symbol(name) - self.collect_data.symbol_stats.register_branch_creation(symbol) branch_data = _BranchData( - self.collect_data.key_generator.gen_id(), symbol, branch_number) + self.collect_data.key_generator.gen_id(), symbol, branch_number + ) self.branches_data[branch_number] = branch_data return branch_data @@ -262,9 +316,9 @@ class _SymbolDataCollector: """Record that tag NAME refers to the specified REVISION.""" symbol = self.pdc.get_symbol(name) - self.collect_data.symbol_stats.register_tag_creation(symbol) tag_data = _TagData( - self.collect_data.key_generator.gen_id(), symbol, revision) + self.collect_data.key_generator.gen_id(), symbol, revision + ) self.tags_data.setdefault(revision, []).append(tag_data) return tag_data @@ -282,10 +336,10 @@ class _SymbolDataCollector: # Check that the symbol is not already defined, which can easily # happen when --symbol-transform is used: if name in self._known_symbols: - err = "%s: Multiple definitions of the symbol '%s' in '%s'" \ - % (error_prefix, name, self.cvs_file.filename) - sys.stderr.write(err + "\n") - self.collect_data.fatal_errors.append(err) + self.collect_data.record_fatal_error( + "Multiple definitions of the symbol '%s' in '%s'" + % (name, self.cvs_file.filename) + ) return self._known_symbols.add(name) @@ -297,46 +351,41 @@ class _SymbolDataCollector: else: self._add_tag(name, revision) - def rev_to_branch_data(self, revision): - """Return the branch_data of the branch on which REVISION lies. + def rev_to_branch_number(revision): + """Return the branch_number of the branch on which REVISION lies. + REVISION is a branch revision number with an even number of components; for example '1.7.2.1' (never '1.7.2' nor '1.7.0.2'). - For the convenience of callers, REVISION can also be a trunk - revision such as '1.2', in which case just return None.""" + The return value is the branch number (for example, '1.7.2'). + Return none iff REVISION is a trunk revision such as '1.2'.""" if is_trunk_revision(revision): return None - return self.branches_data.get(revision[:revision.rindex(".")]) + return revision[:revision.rindex(".")] - def register_commit(self, rev_data): - """If REV_DATA describes a non-trunk revision number, then record - it as a commit on the corresponding branch. This records the - commit in symbol_stats, which is used to generate statistics for - --force-branch and --force-tag guidance.""" - - rev = rev_data.rev - if is_branch_revision(rev): - branch_number = rev[:rev.rindex(".")] - - branch_data = self.branches_data[branch_number] - - # Register the commit on this non-trunk branch - self.collect_data.symbol_stats.register_branch_commit( - branch_data.symbol) + rev_to_branch_number = staticmethod(rev_to_branch_number) - def register_branch_blockers(self): - for (revision, tag_data_list) in self.tags_data.items(): - if is_branch_revision(revision): - branch_data_parent = self.rev_to_branch_data(revision) - for tag_data in tag_data_list: - self.collect_data.symbol_stats.register_branch_blocker( - branch_data_parent.symbol, tag_data.symbol) + def rev_to_branch_data(self, revision): + """Return the branch_data of the branch on which REVISION lies. + + REVISION must be a branch revision number with an even number of + components; for example '1.7.2.1' (never '1.7.2' nor '1.7.0.2'). + Raise KeyError iff REVISION is unknown.""" - for branch_data_child in self.branches_data.values(): - if is_branch_revision(branch_data_child.parent): - branch_data_parent = self.rev_to_branch_data(branch_data_child.parent) - self.collect_data.symbol_stats.register_branch_blocker( - branch_data_parent.symbol, branch_data_child.symbol) + assert not is_trunk_revision(revision) + + return self.branches_data[self.rev_to_branch_number(revision)] + + def rev_to_lod(self, revision): + """Return the line of development on which REVISION lies. + + REVISION must be a revision number with an even number of + components. Raise KeyError iff REVISION is unknown.""" + + if is_trunk_revision(revision): + return self.pdc.trunk + else: + return self.rev_to_branch_data(revision).symbol class _FileDataCollector(cvs2svn_rcsparse.Sink): @@ -362,51 +411,25 @@ class _FileDataCollector(cvs2svn_rcspars # { revision : _RevisionData instance } self._rev_data = { } - # A list [ revision ] of the revision numbers seen, in the order - # they were given to us by rcsparse: - self._rev_order = [] - # Lists [ (parent, child) ] of revision number pairs indicating # that revision child depends on revision parent along the main # line of development. self._primary_dependencies = [] + # The revision number of the root revision in the dependency tree. + # This is usually '1.1', but could be something else due to + # cvsadmin -o + self._root_rev = None + # If set, this is an RCS branch number -- rcsparse calls this the # "principal branch", but CVS and RCS refer to it as the "default # branch", so that's what we call it, even though the rcsparse API # setter method is still 'set_principal_branch'. self.default_branch = None - # The default RCS branch, if any, for this CVS file. - # - # The value is None or a vendor branch revision, such as - # '1.1.1.1', or '1.1.1.2', or '1.1.1.96'. The vendor branch - # revision represents the highest vendor branch revision thought - # to have ever been head of the default branch. - # - # The reason we record a specific vendor revision, rather than a - # default branch number, is that there are two cases to handle: - # - # One case is simple. The RCS file lists a default branch - # explicitly in its header, such as '1.1.1'. In this case, we - # know that every revision on the vendor branch is to be treated - # as head of trunk at that point in time. - # - # But there's also a degenerate case. The RCS file does not - # currently have a default branch, yet we can deduce that for some - # period in the past it probably *did* have one. For example, the - # file has vendor revisions 1.1.1.1 -> 1.1.1.96, all of which are - # dated before 1.2, and then it has 1.1.1.97 -> 1.1.1.100 dated - # after 1.2. In this case, we should record 1.1.1.96 as the last - # vendor revision to have been the head of the default branch. - self.cvs_file_default_branch = None - - # If the RCS file doesn't have a default branch anymore, but does - # have vendor revisions, then we make an educated guess that those - # revisions *were* the head of the default branch up until the - # commit of 1.2, at which point the file's default branch became - # trunk. This records the date at which 1.2 was committed. - self.first_non_vendor_revision_date = None + # True iff revision 1.1 of the file appears to have been imported + # (as opposed to added normally). + self._file_imported = False # A list of rev_data for each revision, in the order that the # corresponding set_revision_info() callback was called. This @@ -415,6 +438,8 @@ class _FileDataCollector(cvs2svn_rcspars # parse_completed(). self._revision_data = [] + self.collect_data.revision_recorder.start_file(self.cvs_file) + def _get_rev_id(self, revision): if revision is None: return None @@ -447,28 +472,31 @@ class _FileDataCollector(cvs2svn_rcspars """This is a callback method declared in Sink.""" for branch in branches: - branch_number = branch[:branch.rindex('.')] - - branch_data = self.sdc.branches_data.get(branch_number) - - if branch_data is None: + try: + branch_data = self.sdc.rev_to_branch_data(branch) + except KeyError: # Normally we learn about the branches from the branch names # and numbers parsed from the symbolic name header. But this # must have been an unlabeled branch that slipped through the # net. Generate a name for it and create a _BranchData record # for it now. - branch_data = self.sdc._add_unlabeled_branch(branch_number) + branch_data = self.sdc._add_unlabeled_branch( + self.sdc.rev_to_branch_number(branch)) assert branch_data.child is None branch_data.child = branch + if revision in self._rev_data: + # This revision has already been seen. + raise FatalError( + 'File %r contains duplicate definitions of revision %s.' + % (self.cvs_file.filename, revision,)) + # Record basic information about the revision: - self._rev_data[revision] = _RevisionData( + rev_data = _RevisionData( self.collect_data.key_generator.gen_id(), revision, int(timestamp), author, state) - - # Remember the order that revisions were defined: - self._rev_order.append(revision) + self._rev_data[revision] = rev_data # When on trunk, the RCS 'next' revision number points to what # humans might consider to be the 'previous' revision number. For @@ -490,8 +518,8 @@ class _FileDataCollector(cvs2svn_rcspars else: self._primary_dependencies.append( (revision, next,) ) - def _resolve_dependencies(self): - """Store the primary and branch dependencies into the rev_data objects.""" + def _resolve_primary_dependencies(self): + """Resolve the dependencies listed in self._primary_dependencies.""" for (parent, child,) in self._primary_dependencies: parent_data = self._rev_data[parent] @@ -502,323 +530,379 @@ class _FileDataCollector(cvs2svn_rcspars assert child_data.parent is None child_data.parent = parent + def _resolve_branch_dependencies(self): + """Resolve dependencies involving branches.""" + for branch_data in self.sdc.branches_data.values(): # The branch_data's parent has the branch as a child regardless # of whether the branch had any subsequent commits: + try: parent_data = self._rev_data[branch_data.parent] + except KeyError: + Log().warn( + 'In %r:\n' + ' branch %r references non-existing revision %s\n' + ' and will be ignored.' + % (self.cvs_file.filename, branch_data.symbol.name, + branch_data.parent,)) + del self.sdc.branches_data[branch_data.branch_number] + else: parent_data.branches_data.append(branch_data) - if not Ctx().trunk_only and parent_data.child is not None: - closing_data = self._rev_data[parent_data.child] - closing_data.closed_symbols_data.append(branch_data) - # If the branch has a child (i.e., something was committed on # the branch), then we store a reference to the branch_data - # there, and also define the child's parent to be the branch's - # parent: + # there, define the child's parent to be the branch's parent, + # and list the child in the branch parent's branches_revs_data: if branch_data.child is not None: child_data = self._rev_data[branch_data.child] assert child_data.parent_branch_data is None child_data.parent_branch_data = branch_data assert child_data.parent is None child_data.parent = branch_data.parent + parent_data.branches_revs_data.append(branch_data.child) + + def _sort_branches(self): + """Sort the branches sprouting from each revision in creation order. + + Creation order is taken to be the reverse of the order that they + are listed in the symbols part of the RCS file. (If a branch is + created then deleted, a later branch can be assigned the recycled + branch number; therefore branch numbers are not an indication of + creation order.)""" + + for rev_data in self._rev_data.values(): + rev_data.branches_data.sort(lambda a, b: - cmp(a.id, b.id)) + + def _resolve_tag_dependencies(self): + """Resolve dependencies involving tags.""" - for tag_data_list in self.sdc.tags_data.values(): + for (rev, tag_data_list) in self.sdc.tags_data.items(): + try: + parent_data = self._rev_data[rev] + except KeyError: + Log().warn( + 'In %r:\n' + ' the following tag(s) reference non-existing revision %s\n' + ' and will be ignored:\n' + ' %s' % ( + self.cvs_file.filename, rev, + ', '.join([repr(tag_data.symbol.name) + for tag_data in tag_data_list]),)) + del self.sdc.tags_data[rev] + else: for tag_data in tag_data_list: + assert tag_data.rev == rev # The tag_data's rev has the tag as a child: - parent_data = self._rev_data[tag_data.rev] parent_data.tags_data.append(tag_data) - if not Ctx().trunk_only and parent_data.child is not None: - closing_data = self._rev_data[parent_data.child] - closing_data.closed_symbols_data.append(tag_data) - - def _update_default_branch(self, rev_data): - """Ratchet up the highest vendor head revision based on REV_DATA, - if necessary.""" + def _determine_root_rev(self): + """Determine self.root_rev. - if self.default_branch: - default_branch_root = self.default_branch + "." - if (rev_data.rev.startswith(default_branch_root) - and default_branch_root.count('.') == rev_data.rev.count('.')): - # This revision is on the default branch, so record that it is - # the new highest default branch head revision. - self.cvs_file_default_branch = rev_data.rev - else: - # No default branch, so make an educated guess. - if rev_data.rev == '1.2': - # This is probably the time when the file stopped having a - # default branch, so make a note of it. - self.first_non_vendor_revision_date = rev_data.timestamp - else: - if vendor_revision.match(rev_data.rev) \ - and (not self.first_non_vendor_revision_date - or rev_data.timestamp - < self.first_non_vendor_revision_date): - # We're looking at a vendor revision, and it wasn't - # committed after this file lost its default branch, so bump - # the maximum trunk vendor revision in the permanent record. - self.cvs_file_default_branch = rev_data.rev - - def _resync_chain(self, rev_data): - """If the REV_DATA.parent revision exists and it occurred later - than the REV_DATA revision, then shove the previous revision back - in time (and any before it that may need to shift). Return True - iff any resyncing was done. - - We sync backwards and not forwards because any given CVS Revision - has only one previous revision. However, a CVS Revision can *be* - a previous revision for many other revisions (e.g., a revision - that is the source of multiple branches). This becomes relevant - when we do the secondary synchronization in pass 2--we can make - certain that we don't resync a revision earlier than its previous - revision, but it would be non-trivial to make sure that we don't - resync revision R *after* any revisions that have R as a previous - revision.""" - - resynced = False - while rev_data.parent is not None: - prev_rev_data = self._rev_data[rev_data.parent] - - if prev_rev_data.timestamp < rev_data.timestamp: - # No resyncing needed here. - return resynced - - old_timestamp = prev_rev_data.timestamp - prev_rev_data.adjust_timestamp(rev_data.timestamp - 1) - resynced = True - delta = prev_rev_data.timestamp - old_timestamp - Log().verbose( - "PASS1 RESYNC: '%s' (%s): old time='%s' delta=%ds" - % (self.cvs_file.cvs_path, prev_rev_data.rev, - time.ctime(old_timestamp), delta)) - if abs(delta) > config.COMMIT_THRESHOLD: - Log().warn( - "%s: Significant timestamp change for '%s' (%d seconds)" - % (warning_prefix, self.cvs_file.cvs_path, delta)) - rev_data = prev_rev_data + We use the fact that it is the only revision without a parent.""" - return resynced + for rev_data in self._rev_data.values(): + if rev_data.parent is None: + assert self._root_rev is None + self._root_rev = rev_data.rev + assert self._root_rev is not None def tree_completed(self): - """The revision tree has been parsed. Analyze it for consistency. + """The revision tree has been parsed. - This is a callback method declared in Sink.""" - - for rev in self._rev_order: - rev_data = self._rev_data[rev] - self.sdc.register_commit(rev_data) - self._update_default_branch(rev_data) + Analyze it for consistency and connect some loose ends. - self._resolve_dependencies() + This is a callback method declared in Sink.""" - # Our algorithm depends upon the timestamps on the revisions occuring - # monotonically over time. That is, we want to see rev 1.34 occur in - # time before rev 1.35. If we inserted 1.35 *first* (due to the time- - # sorting), and then tried to insert 1.34, we'd be screwed. - - # To perform the analysis, we'll simply visit all of the 'previous' - # links that we have recorded and validate that the timestamp on the - # previous revision is before the specified revision. - - # If we have to resync some nodes, then we restart the scan. Just - # keep looping as long as we need to restart. - while True: - for rev_data in self._rev_data.values(): - if self._resync_chain(rev_data): - # Abort for loop, causing the scan to start again: - break - else: - # Finished the for-loop without having to resync anything. - # We're done. - return + self._resolve_primary_dependencies() + self._resolve_branch_dependencies() + self._sort_branches() + self._resolve_tag_dependencies() + self._determine_root_rev() def _determine_operation(self, rev_data): - # How to tell if a CVSRevision is an add, a change, or a deletion: - # - # It's a delete if RCS state is 'dead' - # - # It's an add if RCS state is 'Exp.' and - # - we either have no previous revision - # or - # - we have a previous revision whose state is 'dead' - # - # Anything else is a change. prev_rev_data = self._rev_data.get(rev_data.parent) + type = cvs_revision_type_map[( + rev_data.state != 'dead', + prev_rev_data is not None and prev_rev_data.state != 'dead', + )] - if rev_data.state == 'dead': - op = OP_DELETE - elif prev_rev_data is None or prev_rev_data.state == 'dead': - op = OP_ADD - else: - op = OP_CHANGE - - # There can be an odd situation where the tip revision of a branch - # is alive, but every predecessor on the branch is in state 'dead', - # yet the revision from which the branch sprouts is alive. (This - # is sort of a mirror image of the more common case of adding a - # file on a branch, in which the first revision on the branch is - # alive while the revision from which it sprouts is dead.) - # - # In this odd situation, we must mark the first live revision on - # the branch as an OP_CHANGE instead of an OP_ADD, because it - # reflects, however indirectly, a change w.r.t. the source - # revision from which the branch sprouts. - # - # This is issue #89. - if is_branch_revision(rev_data.rev) and rev_data.state != 'dead': - cur_rev_data = rev_data - while True: - if cur_rev_data.parent is None: - break - prev_rev_data = self._rev_data[cur_rev_data.parent] - if (not is_same_line_of_development(cur_rev_data.rev, - prev_rev_data.rev) - and cur_rev_data.state == 'dead' - and prev_rev_data.state != 'dead'): - op = OP_CHANGE - cur_rev_data = prev_rev_data - - return op + return type def set_revision_info(self, revision, log, text): """This is a callback method declared in Sink.""" rev_data = self._rev_data[revision] + + if rev_data.metadata_id is not None: + # Users have reported problems with repositories in which the + # deltatext block for revision 1.1 appears twice. It is not + # known whether this results from a CVS/RCS bug, or from botched + # hand-editing of the repository. In any case, empirically, cvs + # and rcs both use the first version when checking out data, so + # that's what we will do. (For the record: "cvs log" fails on + # such a file; "rlog" prints the log message from the first + # block and ignores the second one.) + Log().warn( + "%s: in '%s':\n" + " Deltatext block for revision %s appeared twice;\n" + " ignoring the second occurrence.\n" + % (warning_prefix, self.cvs_file.filename, revision,) + ) + return + + if is_branch_revision(revision): + branch_name = self.sdc.rev_to_branch_data(revision).symbol.name + else: + branch_name = None + rev_data.metadata_id = self.collect_data.metadata_db.get_key( - self.project, rev_data.author, log) + self.project, branch_name, rev_data.author, log) rev_data.deltatext_exists = bool(text) - # "...Give back one kadam to honor the Hebrew God whose Ark this is." - # -- Imam to Indy and Sallah, in 'Raiders of the Lost Ark' - # - # If revision 1.1 appears to have been created via 'cvs add' - # instead of 'cvs import', then this file probably never had a - # default branch, so retroactively remove its record in the - # default branches db. The test is that the log message CVS uses - # for 1.1 in imports is "Initial revision\n" with no period. - if revision == '1.1' and log != 'Initial revision\n': - self.cvs_file_default_branch = None + # If this is revision 1.1, determine whether the file appears to + # have been created via 'cvs add' instead of 'cvs import'. The + # test is that the log message CVS uses for 1.1 in imports is + # "Initial revision\n" with no period. (This fact helps determine + # whether this file might have had a default branch in the past.) + if revision == '1.1': + self._file_imported = (log == 'Initial revision\n') self._revision_data.append(rev_data) - def _is_default_branch_revision(self, rev_data): - """Return True iff REV_DATA.rev is a default branch revision.""" + rev_data.revision_recorder_token = \ + self.collect_data.revision_recorder.record_text( + self._rev_data, revision, log, text) - val = self.cvs_file_default_branch - if val is not None: - val_last_dot = val.rindex(".") - our_last_dot = rev_data.rev.rindex(".") - default_branch = val[:val_last_dot] - our_branch = rev_data.rev[:our_last_dot] - default_rev_component = int(val[val_last_dot + 1:]) - our_rev_component = int(rev_data.rev[our_last_dot + 1:]) - if (default_branch == our_branch - and our_rev_component <= default_rev_component): - return True + def _get_rev_1_2(self): + """Return the _RevisionData for the revision playing the role of '1.2'. - return False + Return None if there is no such revision.""" + + rev_1_1 = self._rev_data[self._root_rev] + if rev_1_1.child is None: + return None + else: + return self._rev_data[rev_1_1.child] + + def _get_ntdbr_ids(self): + """Determine whether there are any non-trunk default branch revisions. + + If a non-trunk default branch is determined to have existed, yield + the _RevisionData.ids for all revisions that were once non-trunk + default revisions, in dependency order. + + There are two cases to handle: + + One case is simple. The RCS file lists a default branch + explicitly in its header, such as '1.1.1'. In this case, we know + that every revision on the vendor branch is to be treated as head + of trunk at that point in time. + + But there's also a degenerate case. The RCS file does not + currently have a default branch, yet we can deduce that for some + period in the past it probably *did* have one. For example, the + file has vendor revisions 1.1.1.1 -> 1.1.1.96, all of which are + dated before 1.2, and then it has 1.1.1.97 -> 1.1.1.100 dated + after 1.2. In this case, we should record 1.1.1.96 as the last + vendor revision to have been the head of the default branch.""" + + if self.default_branch: + # There is still a default branch; that means that all revisions + # on that branch get marked. + + rev_1_2 = self._get_rev_1_2() + if rev_1_2 is not None: + self.collect_data.record_fatal_error( + 'File has default branch=%s but also a revision %s' + % (self.default_branch, rev_1_2.rev,) + ) + return + + rev = self.sdc.branches_data[self.default_branch].child + while rev: + rev_data = self._rev_data[rev] + yield rev_data.cvs_rev_id + rev = rev_data.child - def _process_revision_data(self, rev_data): - if rev_data.timestamp_was_adjusted(): - # the timestamp on this revision was changed. log it for later - # resynchronization of other files's revisions that occurred - # for this time and log message. - self.collect_data.resync.write( - '%08lx %x %08lx\n' - % (rev_data.original_timestamp, rev_data.metadata_id, - rev_data.timestamp)) - - if is_branch_revision(rev_data.rev): - branch_data = self.sdc.rev_to_branch_data(rev_data.rev) - lod = Branch(branch_data.symbol) + elif self._file_imported: + # No default branch, but the file appears to have been imported. + # So our educated guess is that all revisions on the '1.1.1' + # branch with timestamps prior to the timestamp of '1.2' were + # non-trunk default branch revisions. + # + # This really only processes standard '1.1.1.*'-style vendor + # revisions. One could conceivably have a file whose default + # branch is 1.1.3 or whatever, or was that at some point in + # time, with vendor revisions 1.1.3.1, 1.1.3.2, etc. But with + # the default branch gone now, we'd have no basis for assuming + # that the non-standard vendor branch had ever been the default + # branch anyway. + # + # Note that we rely on comparisons between the timestamps of the + # revisions on the vendor branch and that of revision 1.2, even + # though the timestamps might be incorrect due to clock skew. + # We could do a slightly better job if we used the changeset + # timestamps, as it is possible that the dependencies that went + # into determining those timestamps are more accurate. But that + # would require an extra pass or two. + vendor_branch_data = self.sdc.branches_data.get('1.1.1') + if vendor_branch_data is not None: + rev_1_2 = self._get_rev_1_2() + if rev_1_2 is None: + rev_1_2_timestamp = None else: - lod = Trunk() + rev_1_2_timestamp = rev_1_2.timestamp + + rev = vendor_branch_data.child + + while rev: + rev_data = self._rev_data[rev] + if rev_1_2_timestamp is not None \ + and rev_data.timestamp >= rev_1_2_timestamp: + # That's the end of the once-default branch. + break + yield rev_data.cvs_rev_id + rev = rev_data.child + + def _get_cvs_revision(self, rev_data): + """Create and return a CVSRevision for REV_DATA.""" branch_ids = [ - branch_data.symbol.id + branch_data.id for branch_data in rev_data.branches_data ] + branch_commit_ids = [ + self._get_rev_id(rev) + for rev in rev_data.branches_revs_data + ] + tag_ids = [ - tag_data.symbol.id + tag_data.id for tag_data in rev_data.tags_data ] - closed_symbol_ids = [ - closed_symbol_data.symbol.id - for closed_symbol_data in rev_data.closed_symbols_data - ] + revision_type = self._determine_operation(rev_data) - cvs_rev = CVSRevision( + return revision_type( self._get_rev_id(rev_data.rev), self.cvs_file, rev_data.timestamp, rev_data.metadata_id, self._get_rev_id(rev_data.parent), self._get_rev_id(rev_data.child), - self._determine_operation(rev_data), rev_data.rev, rev_data.deltatext_exists, - lod, - rev_data.is_first_on_branch(), - self._is_default_branch_revision(rev_data), - tag_ids, branch_ids, closed_symbol_ids) - rev_data.cvs_rev = cvs_rev - self.collect_data.add_cvs_item(cvs_rev) + self.sdc.rev_to_lod(rev_data.rev), + rev_data.get_first_on_branch_id(), + rev_data.non_trunk_default_branch_revision, + self._get_rev_id(rev_data.default_branch_prev), + self._get_rev_id(rev_data.default_branch_next), + tag_ids, branch_ids, branch_commit_ids, + rev_data.revision_recorder_token) + + def _get_cvs_revisions(self): + """Generate the CVSRevisions present in this file.""" + + for rev_data in self._revision_data: + yield self._get_cvs_revision(rev_data) + + def _get_cvs_branches(self): + """Generate the CVSBranches present in this file.""" + + for branch_data in self.sdc.branches_data.values(): + yield CVSBranch( + branch_data.id, self.cvs_file, branch_data.symbol, + branch_data.branch_number, + self.sdc.rev_to_lod(branch_data.parent), + self._get_rev_id(branch_data.parent), + self._get_rev_id(branch_data.child), + ) + + def _get_cvs_tags(self): + """Generate the CVSTags present in this file.""" + + for tags_data in self.sdc.tags_data.values(): + for tag_data in tags_data: + yield CVSTag( + tag_data.id, self.cvs_file, tag_data.symbol, + self.sdc.rev_to_lod(tag_data.rev), + self._get_rev_id(tag_data.rev), + ) def parse_completed(self): """Finish the processing of this file. - - Create CVSRevisions for all rev_data seen. + This is a callback method declared in Sink.""" - - Walk through all branches and tags and register them with their - parent branch in the symbol database. + pass - This is a callback method declared in Sink.""" + def get_cvs_file_items(self): + """Finish up and return a CVSFileItems instance for this file. - for rev_data in self._revision_data: - self._process_revision_data(rev_data) + Also fix up any non-trunk default branch revisions (if present) by + setting their default_branch_revision members to True and + connecting the last one with revision 1.2. - self.collect_data.add_cvs_file(self.cvs_file) + This method must only be called once. - self.sdc.register_branch_blockers() + """ - # Break a circular linkage, allowing self and sdc to be freed. - del self.sdc + cvs_items = [] + cvs_items.extend(self._get_cvs_revisions()) + cvs_items.extend(self._get_cvs_branches()) + cvs_items.extend(self._get_cvs_tags()) + cvs_file_items = CVSFileItems(self.cvs_file, self.pdc.trunk, cvs_items) + if Log().is_on(Log.DEBUG): + cvs_file_items.check_symbol_parent_lods() -ctrl_characters_regexp = re.compile('[\\\x00-\\\x1f\\\x7f]') + ntdbr_ids = list(self._get_ntdbr_ids()) -def verify_filename_legal(filename): - """Verify that FILENAME does not include any control characters. If - it does, raise a FatalError.""" + # Break a circular reference loop, allowing the memory for self + # and sdc to be freed. + del self.sdc - m = ctrl_characters_regexp.search(filename) - if m: - raise FatalError( - "Character %r in filename %r is not supported by Subversion." - % (m.group(), filename,)) + if ntdbr_ids: + rev_1_2 = self._get_rev_1_2() + if rev_1_2 is not None: + rev_1_2_id = rev_1_2.cvs_rev_id + else: + rev_1_2_id = None + cvs_file_items.adjust_ntdbrs(self._file_imported, ntdbr_ids, rev_1_2_id) + + if Log().is_on(Log.DEBUG): + cvs_file_items.check_symbol_parent_lods() + + return cvs_file_items class _ProjectDataCollector: def __init__(self, collect_data, project): self.collect_data = collect_data self.project = project - self.found_valid_file = False - self.fatal_errors = [] + self.found_rcs_file = False self.num_files = 0 + # The Trunk LineOfDevelopment object for this project. + self.trunk = Trunk( + self.collect_data.symbol_key_generator.gen_id(), self.project) + # This causes a record for self.trunk to spring into existence: + self.collect_data.symbol_stats[self.trunk] + # A map { name -> Symbol } for all known symbols in this project. + # The symbols listed here are undifferentiated into Branches and + # Tags because the same name might appear as a branch in one file + # and a tag in another. self.symbols = {} - os.path.walk(self.project.project_cvs_repos_path, - _ProjectDataCollector._visit_directory, self) - if not self.fatal_errors and not self.found_valid_file: - self.fatal_errors.append( - '\n' + self._visit_non_attic_directory(self.project.project_cvs_repos_path) + + if not self.found_rcs_file: + self.collect_data.record_fatal_error( 'No RCS files found under %r!\n' 'Are you absolutely certain you are pointing cvs2svn\n' 'at a CVS repository?\n' - % self.project.project_cvs_repos_path) + % (self.project.project_cvs_repos_path,) + ) def get_symbol(self, name): """Return the Symbol object for the symbol named NAME in this project. @@ -834,42 +918,141 @@ class _ProjectDataCollector: self.symbols[name] = symbol return symbol - def _process_file(self, pathname): - fdc = _FileDataCollector(self, self.project.get_cvs_file(pathname)) - - if not fdc.cvs_file.in_attic: - # If this file also exists in the attic, it's a fatal error - attic_path = os.path.join( - os.path.dirname(pathname), 'Attic', os.path.basename(pathname)) - if os.path.exists(attic_path): - err = "%s: A CVS repository cannot contain both %s and %s" \ - % (error_prefix, pathname, attic_path) - sys.stderr.write(err + '\n') - self.fatal_errors.append(err) - + def _process_file(self, cvs_file): + Log().normal(cvs_file.filename) + fdc = _FileDataCollector(self, cvs_file) try: - cvs2svn_rcsparse.parse(open(pathname, 'rb'), fdc) - except (cvs2svn_rcsparse.common.RCSParseError, ValueError, - RuntimeError): - err = "%s: '%s' is not a valid ,v file" \ - % (error_prefix, pathname) - sys.stderr.write(err + '\n') - self.fatal_errors.append(err) + cvs2svn_rcsparse.parse(open(cvs_file.filename, 'rb'), fdc) + except (cvs2svn_rcsparse.common.RCSParseError, ValueError, RuntimeError): + self.collect_data.record_fatal_error( + "%r is not a valid ,v file" % (cvs_file.filename,) + ) except: - Log().warn("Exception occurred while parsing %s" % pathname) + Log().warn("Exception occurred while parsing %s" % cvs_file.filename) raise + else: self.num_files += 1 - def _visit_directory(self, dirname, files): - for fname in files: - verify_filename_legal(fname) - if not fname.endswith(',v'): - continue - self.found_valid_file = True + cvs_file_items = fdc.get_cvs_file_items() + + del fdc + + # Remove CVSRevisionDeletes that are not needed: + cvs_file_items.remove_unneeded_deletes(self.collect_data.metadata_db) + + # Remove initial branch deletes that are not needed: + cvs_file_items.remove_initial_branch_deletes( + self.collect_data.metadata_db + ) + + # If this is a --trunk-only conversion, discard all branches and + # tags, then draft any non-trunk default branch revisions to + # trunk: + if Ctx().trunk_only: + cvs_file_items.exclude_non_trunk() + + self.collect_data.revision_recorder.finish_file(cvs_file_items) + self.collect_data.add_cvs_file_items(cvs_file_items) + self.collect_data.symbol_stats.register(cvs_file_items) + + def _get_attic_file(self, pathname): + """Return a CVSFile object for the Attic file at PATHNAME.""" + + try: + return self.project.get_cvs_file(pathname) + except FileInAndOutOfAtticException, e: + if Ctx().retain_conflicting_attic_files: + Log().warn( + "%s: %s;\n" + " storing the latter into 'Attic' subdirectory.\n" + % (warning_prefix, e) + ) + else: + self.collect_data.record_fatal_error(str(e)) + + # Either way, return a CVSFile object so that the rest of the + # file processing can proceed: + return self.project.get_cvs_file(pathname, leave_in_attic=True) + + def _visit_attic_directory(self, dirname): + # Maps { fname[:-2] : pathname }: + rcsfiles = {} + + for fname in os.listdir(dirname): pathname = os.path.join(dirname, fname) - Log().normal(pathname) + if os.path.isdir(pathname): + Log().warn("Directory %s found within Attic; ignoring" % (pathname,)) + elif fname.endswith(',v'): + self.found_rcs_file = True + rcsfiles[fname[:-2]] = pathname + self._process_file(self._get_attic_file(pathname)) + + return rcsfiles + + def _get_non_attic_file(self, pathname): + """Return a CVSFile object for the non-Attic file at PATHNAME.""" + + return self.project.get_cvs_file(pathname) + + def _visit_non_attic_directory(self, dirname): + files = os.listdir(dirname) + + # Map { fname[:-2] : pathname }: + rcsfiles = {} - self._process_file(pathname) + attic_dir = None + + dirs = [] + + for fname in files[:]: + pathname = os.path.join(dirname, fname) + if os.path.isdir(pathname): + if fname == 'Attic': + attic_dir = fname + else: + dirs.append(fname) + elif fname.endswith(',v'): + self.found_rcs_file = True + rcsfiles[fname[:-2]] = pathname + self._process_file(self._get_non_attic_file(pathname)) + else: + # Silently ignore other files: + pass + + if attic_dir is not None: + pathname = os.path.join(dirname, attic_dir) + attic_rcsfiles = self._visit_attic_directory(pathname) + alldirs = dirs + [attic_dir] + else: + alldirs = dirs + attic_rcsfiles = {} + + # Check for conflicts between directory names and the filenames + # that will result from the rcs files (both in this directory and + # in attic). (We recurse into the subdirectories nevertheless, to + # try to detect more problems.) + for fname in alldirs: + pathname = os.path.join(dirname, fname) + for rcsfile_list in [rcsfiles, attic_rcsfiles]: + if fname in rcsfile_list: + self.collect_data.record_fatal_error( + 'Directory name conflicts with filename. Please remove or ' + 'rename one\n' + 'of the following:\n' + ' "%s"\n' + ' "%s"' + % (pathname, rcsfile_list[fname],) + ) + + # Now recurse into the other subdirectories: + for fname in dirs: + pathname = os.path.join(dirname, fname) + + # Verify that the directory name does not contain any illegal + # characters: + self.project.verify_filename_legal(pathname, fname) + + self._visit_non_attic_directory(pathname) class CollectData: @@ -880,11 +1063,10 @@ class CollectData: class by _FileDataCollector instances, one of which is created for each file to be parsed.""" - def __init__(self, stats_keeper): + def __init__(self, revision_recorder, stats_keeper): + self.revision_recorder = revision_recorder self._cvs_item_store = NewCVSItemStore( artifact_manager.get_temp_file(config.CVS_ITEMS_STORE)) - self.resync = open( - artifact_manager.get_temp_file(config.RESYNC_DATAFILE), 'w') self.metadata_db = MetadataDatabase(DB_OPEN_NEW) self.fatal_errors = [] self.num_files = 0 @@ -896,24 +1078,52 @@ class CollectData: self.symbol_key_generator = KeyGenerator(1) + self.revision_recorder.start() + + def record_fatal_error(self, err): + """Record that fatal error ERR was found. + + ERR is a string (without trailing newline) describing the error. + Output the error to stderr immediately, and record a copy to be + output again in a summary at the end of CollectRevsPass.""" + + err = '%s: %s' % (error_prefix, err,) + sys.stderr.write(err + '\n') + self.fatal_errors.append(err) + def process_project(self, project): pdc = _ProjectDataCollector(self, project) self.num_files += pdc.num_files - self.fatal_errors.extend(pdc.fatal_errors) Log().verbose('Processed', self.num_files, 'files') - def add_cvs_file(self, cvs_file): - """Store CVS_FILE to _cvs_file_db under its persistent id.""" - - Ctx()._cvs_file_db.log_file(cvs_file) + def add_cvs_file_items(self, cvs_file_items): + """Record the information from CVS_FILE_ITEMS. - def add_cvs_item(self, cvs_item): - self._cvs_item_store.add(cvs_item) - if isinstance(cvs_item, CVSRevision): - self.stats_keeper.record_cvs_rev(cvs_item) + Store the CVSFile to _cvs_file_db under its persistent id, store + the CVSItems, and record the CVSItems to self.stats_keeper.""" - def flush(self): + Ctx()._cvs_file_db.log_file(cvs_file_items.cvs_file) + self._cvs_item_store.add(cvs_file_items) + for cvs_item in cvs_file_items.values(): + self.stats_keeper.record_cvs_item(cvs_item) + + def close(self): + """Close the data structures associated with this instance. + + Return a list of fatal errors encountered while processing input. + Each list entry is a string describing one fatal error.""" + + self.revision_recorder.finish() + self.symbol_stats.purge_ghost_symbols() + self.symbol_stats.close() + self.symbol_stats = None + self.metadata_db.close() + self.metadata_db = None self._cvs_item_store.close() - self.symbol_stats.write() + self._cvs_item_store = None + self.revision_recorder = None + retval = self.fatal_errors + self.fatal_errors = None + return retval diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/common.py cvs2svn-2.0.0/cvs2svn_lib/common.py --- cvs2svn-1.5.x/cvs2svn_lib/common.py 2006-10-03 16:43:32.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/common.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -18,17 +18,23 @@ import time +import codecs from cvs2svn_lib.boolean import * from cvs2svn_lib.context import Ctx from cvs2svn_lib.log import Log +# Always use these constants for opening databases. +DB_OPEN_READ = 'r' +DB_OPEN_WRITE = 'w' +DB_OPEN_NEW = 'n' + + SVN_INVALID_REVNUM = -1 # Things that can happen to a file. -OP_NOOP = '-' OP_ADD = 'A' OP_DELETE = 'D' OP_CHANGE = 'C' @@ -50,6 +56,12 @@ class FatalException(Exception): pass +class InternalError(Exception): + """Exception thrown in the case of a cvs2svn internal error (aka, bug).""" + + pass + + class FatalError(FatalException): """A FatalException that prepends error_prefix to the message.""" @@ -114,21 +126,102 @@ def format_date(date): return time.strftime("%Y-%m-%dT%H:%M:%S.000000Z", time.gmtime(date)) -def to_utf8(value, strict=False): - """Encode (as Unicode) VALUE, trying the encodings in Ctx().encoding - as valid source encodings. If all of the encodings fail, then - encode using Ctx().fallback_encoding if it is configured (unless - STRICT is True, in which case raise a UnicodeError).""" +class UTF8Encoder: + """Callable that decodes strings into unicode then encodes them as utf8.""" + + def __init__(self, encodings, fallback_encoding=None): + """Create a UTF8Encoder instance. + + ENCODINGS is a list containing the names of encodings that are + attempted to be used as source encodings in 'strict' mode. + + FALLBACK_ENCODING, if specified, is the name of an encoding that + should be used as a source encoding in lossy 'replace' mode if all + of ENCODINGS failed. + + Raise LookupError if any of the specified encodings is unknown.""" + + self.decoders = [ + (encoding, codecs.lookup(encoding)[1]) + for encoding in encodings] - for encoding in Ctx().encoding: + if fallback_encoding is None: + self.fallback_decoder = None + else: + self.fallback_decoder = ( + fallback_encoding, codecs.lookup(fallback_encoding)[1] + ) + + def __call__(self, s): + """Try to decode 8-bit string S using our configured source encodings. + + Return the string as unicode, encoded in an 8-bit string as utf8. + + Raise UnicodeError if the string cannot be decoded using any of + the source encodings and no fallback encoding was specified.""" + + for (name, decoder) in self.decoders: try: - return unicode(value, encoding).encode('utf8') + return decoder(s)[0].encode('utf8') except ValueError: - Log().verbose("Encoding %r failed for string %r" % (encoding, value)) + Log().verbose("Encoding '%s' failed for string %r" % (name, s)) - if not strict and Ctx().fallback_encoding is not None: - return unicode(value, Ctx().fallback_encoding, 'replace').encode('utf8') + if self.fallback_decoder is not None: + (name, decoder) = self.fallback_decoder + return decoder(s, 'replace')[0].encode('utf8') else: raise UnicodeError +class Timestamper: + """Return monotonic timestamps derived from changeset timestamps.""" + + def __init__(self): + # The last timestamp that has been returned: + self.timestamp = 0.0 + + # The maximum timestamp that is considered reasonable: + self.max_timestamp = time.time() + 24.0 * 60.0 * 60.0 + + def get(self, timestamp, change_expected): + """Return a reasonable timestamp derived from TIMESTAMP. + + Push TIMESTAMP into the future if necessary to ensure that it is + at least one second later than every other timestamp that has been + returned by previous calls to this method. + + If CHANGE_EXPECTED is not True, then log a message if the + timestamp has to be changed.""" + + if timestamp > self.max_timestamp: + # If a timestamp is in the future, it is assumed that it is + # bogus. Shift it backwards in time to prevent it forcing other + # timestamps to be pushed even further in the future. + + # Note that this is not nearly a complete solution to the bogus + # timestamp problem. A timestamp in the future still affects + # the ordering of changesets, and a changeset having such a + # timestamp will not be committed until all changesets with + # earlier timestamps have been committed, even if other + # changesets with even earlier timestamps depend on this one. + self.timestamp = self.timestamp + 1.0 + if not change_expected: + Log().warn( + 'Timestamp "%s" is in the future; changed to "%s".' + % (time.asctime(time.gmtime(timestamp)), + time.asctime(time.gmtime(self.timestamp)),) + ) + elif timestamp < self.timestamp + 1.0: + self.timestamp = self.timestamp + 1.0 + if not change_expected and Log().is_on(Log.VERBOSE): + Log().verbose( + 'Timestamp "%s" adjusted to "%s" to ensure monotonicity.' + % (time.asctime(time.gmtime(timestamp)), + time.asctime(time.gmtime(self.timestamp)),) + ) + else: + self.timestamp = timestamp + + return self.timestamp + + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/config.py cvs2svn-2.0.0/cvs2svn_lib/config.py --- cvs2svn-1.5.x/cvs2svn_lib/config.py 2006-09-16 22:07:20.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/config.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -33,105 +33,187 @@ CO_EXECUTABLE = 'co' CVS_EXECUTABLE = 'cvs' SORT_EXECUTABLE = 'sort' -# These files are related to the cleaning and sorting of CVS revisions, -# for commit grouping. See design-notes.txt for details. -CVS_REVS_RESYNC_DATAFILE = 'cvs2svn-revs-resync.txt' -CVS_REVS_SORTED_DATAFILE = 'cvs2svn-revs-resync-s.txt' -RESYNC_DATAFILE = 'cvs2svn-resync.txt' +# The first file contains enough information about each CVSRevision to +# deduce preliminary Changesets. The second file is a sorted version +# of the first. +CVS_REVS_SUMMARY_DATAFILE = 'revs-summary.txt' +CVS_REVS_SUMMARY_SORTED_DATAFILE = 'revs-summary-s.txt' + +# The first file contains enough information about each CVSSymbol to +# deduce preliminary Changesets. The second file is a sorted version +# of the first. +CVS_SYMBOLS_SUMMARY_DATAFILE = 'symbols-summary.txt' +CVS_SYMBOLS_SUMMARY_SORTED_DATAFILE = 'symbols-summary-s.txt' + +# A mapping from CVSItem id to Changeset id. +CVS_ITEM_TO_CHANGESET = 'cvs-item-to-changeset.dat' + +# A mapping from CVSItem id to Changeset id, after the +# RevisionChangeset loops have been broken. +CVS_ITEM_TO_CHANGESET_REVBROKEN = 'cvs-item-to-changeset-revbroken.dat' + +# A mapping from CVSItem id to Changeset id, after the SymbolChangeset +# loops have been broken. +CVS_ITEM_TO_CHANGESET_SYMBROKEN = 'cvs-item-to-changeset-symbroken.dat' + +# A mapping from CVSItem id to Changeset id, after all Changeset +# loops have been broken. +CVS_ITEM_TO_CHANGESET_ALLBROKEN = 'cvs-item-to-changeset-allbroken.dat' + +# A mapping from id to Changeset. +CHANGESETS_INDEX = 'changesets-index.dat' +CHANGESETS_STORE = 'changesets.pck' + +# A mapping from id to Changeset, after the RevisionChangeset loops +# have been broken. +CHANGESETS_REVBROKEN_INDEX = 'changesets-revbroken-index.dat' +CHANGESETS_REVBROKEN_STORE = 'changesets-revbroken.pck' + +# A mapping from id to Changeset, after the RevisionChangesets have +# been sorted and converted into OrderedChangesets. +CHANGESETS_REVSORTED_INDEX = 'changesets-revsorted-index.dat' +CHANGESETS_REVSORTED_STORE = 'changesets-revsorted.pck' + +# A mapping from id to Changeset, after the SymbolChangeset loops have +# been broken. +CHANGESETS_SYMBROKEN_INDEX = 'changesets-symbroken-index.dat' +CHANGESETS_SYMBROKEN_STORE = 'changesets-symbroken.pck' + +# A mapping from id to Changeset, after all Changeset loops have been +# broken. +CHANGESETS_ALLBROKEN_INDEX = 'changesets-allbroken-index.dat' +CHANGESETS_ALLBROKEN_STORE = 'changesets-allbroken.pck' + +# The RevisionChangesets in commit order. Each line contains the +# changeset id and timestamp of one changeset, in hexadecimal, in the +# order that the changesets should be committed to svn. +CHANGESETS_SORTED_DATAFILE = 'changesets-s.txt' # This file contains a marshalled copy of all the statistics that we # gather throughout the various runs of cvs2svn. The data stored as a # marshalled dictionary. -STATISTICS_FILE = 'cvs2svn-statistics.pck' +STATISTICS_FILE = 'statistics.pck' -# This text file contains records (1 per line) that describe svn -# filesystem paths that are the opening and closing source revisions -# for copies to tags and branches. The format is as follows: -# -# SYMBOL_ID SVN_REVNUM TYPE BRANCH_ID CVS_FILE_ID -# -# Where type is either OPENING or CLOSING. The SYMBOL_ID and -# SVN_REVNUM are the primary and secondary sorting criteria for -# creating SYMBOL_OPENINGS_CLOSINGS_SORTED. BRANCH_ID is the symbol -# id of the branch where this opening or closing happened (in hex), or -# '*' for the default branch. CVS_FILE_ID is the id of the -# corresponding CVSFile (in hex). -SYMBOL_OPENINGS_CLOSINGS = 'cvs2svn-symbolic-names.txt' -# A sorted version of the above file. -SYMBOL_OPENINGS_CLOSINGS_SORTED = 'cvs2svn-symbolic-names-s.txt' - -# Skeleton version of an svn filesystem. -# (These supersede and will eventually replace the two above.) -# See class SVNRepositoryMirror for how these work. -SVN_MIRROR_REVISIONS_DB = 'cvs2svn-svn-revisions.db' -SVN_MIRROR_NODES_DB = 'cvs2svn-svn-nodes.db' +# This text file contains records (1 per line) that describe openings +# and closings for copies to tags and branches. The format is as +# follows: +# +# SYMBOL_ID SVN_REVNUM TYPE CVS_SYMBOL_ID +# +# where type is either OPENING or CLOSING. CVS_SYMBOL_ID is the id of +# the CVSSymbol whose opening or closing is being described (in hex). +SYMBOL_OPENINGS_CLOSINGS = 'symbolic-names.txt' +# A sorted version of the above file. SYMBOL_ID and SVN_REVNUM are +# the primary and secondary sorting criteria. It is important that +# SYMBOL_IDs be located together to make it quick to read them at +# once. The order of SVN_REVNUM is only important because it is +# assumed by some internal consistency checks. +SYMBOL_OPENINGS_CLOSINGS_SORTED = 'symbolic-names-s.txt' + +# Skeleton version of an svn filesystem. See class +# SVNRepositoryMirror for how these work. +SVN_MIRROR_REVISIONS_TABLE = 'svn-revisions.dat' +SVN_MIRROR_NODES_INDEX_TABLE = 'svn-nodes-index.dat' +SVN_MIRROR_NODES_STORE = 'svn-nodes.pck' # Offsets pointing to the beginning of each symbol's records in # SYMBOL_OPENINGS_CLOSINGS_SORTED. This file contains a pickled map # from symbol_id to file offset. -SYMBOL_OFFSETS_DB = 'cvs2svn-symbol-offsets.pck' +SYMBOL_OFFSETS_DB = 'symbol-offsets.pck' -# Maps CVSRevision.ids (in hex) to lists of symbol ids, where the -# CVSRevision is the last such that is a source for those symbols. -# For example, if branch B's number is 1.3.0.2 in this CVS file, and -# this file's 1.3 is the latest (by date) revision among *all* CVS -# files that is a source for branch B, then the CVSRevision.id -# corresponding to this file at 1.3 would list at least the symbol id -# for branch B in its list. -SYMBOL_LAST_CVS_REVS_DB = 'cvs2svn-symbol-last-cvs-revs.db' - -# Maps CVSFile.id to instance. -CVS_FILES_DB = 'cvs2svn-cvs-files.db' - -# A series of pickles. The first is a primer. Each subsequent pickle -# is lists of all CVSItems applying to a CVSFile. -CVS_ITEMS_STORE = 'cvs2svn-cvs-items.pck' - -# Maps CVSItem.id (in hex) to CVSRevision after resynchronization. -# The index file contains id->offset, and the second contains the -# pickled CVSItems at the specified offsets. -CVS_ITEMS_RESYNC_INDEX_TABLE = 'cvs2svn-cvs-items-resync-index.dat' -CVS_ITEMS_RESYNC_STORE = 'cvs2svn-cvs-items-resync.pck' +# Pickled map of CVSFile.id to instance. +CVS_FILES_DB = 'cvs-files.pck' + +# A series of records. The first is a pickled serializer. Each +# subsequent record is a serialized list of all CVSItems applying to a +# CVSFile. +CVS_ITEMS_STORE = 'cvs-items.pck' + +# A database of filtered CVSItems. Excluded symbols have been +# discarded (and the dependencies of the remaining CVSItems fixed up). +# These two files are used within an IndexedCVSItemStore; the first is +# a map id-> offset, and the second contains the pickled CVSItems at +# the specified offsets. +CVS_ITEMS_FILTERED_INDEX_TABLE = 'cvs-items-filtered-index.dat' +CVS_ITEMS_FILTERED_STORE = 'cvs-items-filtered.pck' + +# The same as above, but with the CVSItems ordered in groups based on +# their initial changesets. CVSItems will usually be accessed one +# changeset at a time, so this ordering helps disk locality (even +# though some of the changesets will later be broken up). +CVS_ITEMS_SORTED_INDEX_TABLE = 'cvs-items-sorted-index.dat' +CVS_ITEMS_SORTED_STORE = 'cvs-items-sorted.pck' # A record of all symbolic names that will be processed in the # conversion. This file contains a pickled list of TypedSymbol # objects. -SYMBOL_DB = 'cvs2svn-symbols.pck' +SYMBOL_DB = 'symbols.pck' # A pickled list of the statistics for all symbols. Each entry in the # list is an instance of cvs2svn_lib.symbol_statistics._Stats. -SYMBOL_STATISTICS_LIST = 'cvs2svn-symbol-stats.pck' +SYMBOL_STATISTICS = 'symbol-statistics.pck' # These two databases provide a bidirectional mapping between # CVSRevision.ids (in hex) and Subversion revision numbers. # -# The first maps CVSRevision.id to a number; the values are not -# unique. +# The first maps CVSRevision.id to the SVN revision number of which it +# is a part (more than one CVSRevision can map to the same SVN +# revision number). # # The second maps Subversion revision numbers (as hex strings) to # pickled SVNCommit instances. -CVS_REVS_TO_SVN_REVNUMS = 'cvs2svn-cvs-revs-to-svn-revnums.db' -SVN_COMMITS_DB = 'cvs2svn-svn-commits.db' +CVS_REVS_TO_SVN_REVNUMS = 'cvs-revs-to-svn-revnums.dat' + +# This database maps Subversion revision numbers to pickled SVNCommit +# instances. +SVN_COMMITS_INDEX_TABLE = 'svn-commits-index.dat' +SVN_COMMITS_STORE = 'svn-commits.pck' # How many bytes to read at a time from a pipe. 128 kiB should be # large enough to be efficient without wasting too much memory. PIPE_READ_SIZE = 128 * 1024 -# Records the project.id, author, and log message for each changeset. -# There are two types of mapping: digest -> metadata_id, and -# metadata_id -> (projet.id, author, logmessage). The digests are -# computed in such a way that CVS commits that are eligible to be -# combined into the same SVN commit are assigned the same digest. -METADATA_DB = "cvs2svn-metadata.db" - -# If this run's output is a repository, then (in the tmpdir) we use -# a dumpfile of this name for repository loads. -# -# If this run's output is a dumpfile, then this is default name of -# that dumpfile, but in the current directory (unless the user has -# specified a dumpfile path, of course, in which case it will be -# wherever the user said). -DUMPFILE = 'cvs2svn-dump' +# Records the author and log message for each changeset. The database +# contains a map metadata_id -> (author, logmessage). Each +# CVSRevision that is eligible to be combined into the same SVN commit +# is assigned the same id. Note that the (author, logmessage) pairs +# are not necessarily all distinct; other data are taken into account +# when constructing ids. +METADATA_DB = 'metadata.db' + +# The following four databases are used in conjunction with --use-internal-co. + +# Records the RCS deltas for all CVS revisions. The deltas are to be +# applied forward, i.e. those from trunk are reversed wrt RCS. +RCS_DELTAS_INDEX_TABLE = 'rcs-deltas-index.dat' +RCS_DELTAS_STORE = 'rcs-deltas.pck' + +# Records the revision tree of each RCS file. The format is a list of +# list of integers. The outer list holds lines of development, the inner list +# revisions within the LODs, revisions are CVSItem ids. Branches "closer +# to the trunk" appear later. Revisions are sorted by reverse chronological +# order. The last revision of each branch is the revision it sprouts from. +# Revisions that represent deletions at the end of a branch are omitted. +RCS_TREES_INDEX_TABLE = 'rcs-trees-index.dat' +RCS_TREES_STORE = 'rcs-trees.pck' + +# Records the revision tree of each RCS file after removing revisions +# belonging to excluded branches. Note that the branch ordering is arbitrary +# in this file. +RCS_TREES_FILTERED_INDEX_TABLE = 'rcs-trees-filtered-index.dat' +RCS_TREES_FILTERED_STORE = 'rcs-trees-filtered.pck' + +# At any given time during OutputPass, holds the full text of each CVS +# revision that was checked out already and still has descendants that will +# be checked out. +CVS_CHECKOUT_DB = 'cvs-checkout.db' + +# End of DBs related to --use-internal-co. + +# If this run will output directly to a Subversion repository, then +# this is the name of the file that each revision will temporarily be +# written to prior to writing it into the repository. +DUMPFILE = 'svn.dump' # flush a commit if a 5 minute gap occurs. COMMIT_THRESHOLD = 5 * 60 diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/context.py cvs2svn-2.0.0/cvs2svn_lib/context.py --- cvs2svn-1.5.x/cvs2svn_lib/context.py 2006-10-03 16:43:32.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/context.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -42,23 +42,23 @@ class Ctx: self.output_option = None self.dry_run = False - self.use_cvs = False + self.revision_reader = None self.svnadmin_executable = config.SVNADMIN_EXECUTABLE - self.co_executable = config.CO_EXECUTABLE - self.cvs_executable = config.CVS_EXECUTABLE self.sort_executable = config.SORT_EXECUTABLE self.trunk_only = False self.prune = True - self.encoding = ["ascii"] - self.fallback_encoding = None + self.utf8_encoder = lambda s: s.decode('ascii').encode('utf8') + self.filename_utf8_encoder = lambda s: s.decode('ascii').encode('utf8') self.symbol_strategy = None self.username = None self.svn_property_setters = [] - self.tmpdir = '.' + self.tmpdir = 'cvs2svn-tmp' self.skip_cleanup = False # A list of Project instances for all projects being converted. self.projects = [] self.cross_project_commits = True + self.cross_branch_commits = True + self.retain_conflicting_attic_files = False def add_project(self, project): """Add a project to be converted.""" diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/cvs_commit.py cvs2svn-2.0.0/cvs2svn_lib/cvs_commit.py --- cvs2svn-1.5.x/cvs2svn_lib/cvs_commit.py 2006-09-10 16:36:26.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/cvs_commit.py 1970-01-01 01:00:00.000000000 +0100 @@ -1,381 +0,0 @@ -# (Be in -*- python -*- mode.) -# -# ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. -# -# This software is licensed as described in the file COPYING, which -# you should have received as part of this distribution. The terms -# are also available at http://subversion.tigris.org/license-1.html. -# If newer versions of this license are posted there, you may use a -# newer version instead, at your option. -# -# This software consists of voluntary contributions made by many -# individuals. For exact contribution history, see the revision -# history and logs, available at http://cvs2svn.tigris.org/. -# ==================================================================== - -"""This module contains the CVSCommit class.""" - -import time - -from cvs2svn_lib.boolean import * -from cvs2svn_lib.set_support import * -from cvs2svn_lib import config -from cvs2svn_lib.common import warning_prefix -from cvs2svn_lib.common import OP_ADD -from cvs2svn_lib.common import OP_CHANGE -from cvs2svn_lib.common import OP_DELETE -from cvs2svn_lib.context import Ctx -from cvs2svn_lib.svn_commit import SVNCommit -from cvs2svn_lib.svn_commit import SVNPrimaryCommit -from cvs2svn_lib.svn_commit import SVNPreCommit -from cvs2svn_lib.svn_commit import SVNPostCommit -from cvs2svn_lib.log import Log -from cvs2svn_lib.line_of_development import Branch - - -class CVSCommit: - """Each instance of this class contains a number of CVS Revisions - that correspond to one or more Subversion Commits. After all CVS - Revisions are added to the grouping, calling process_revisions will - generate a Subversion Commit (or Commits) for the set of CVS - Revisions in the grouping.""" - - def __init__(self, metadata_id, author, log): - self.metadata_id = metadata_id - self.author = author - self.log = log - - # Set of other CVSCommits we depend directly upon. - self._deps = set() - - # This field remains True until this CVSCommit is moved from the - # expired queue to the ready queue. At that point we stop blocking - # other commits. - self.pending = True - - # Lists of CVSRevisions - self.changes = [ ] - self.deletes = [ ] - - # Start out with a t_min higher than any incoming time T, and a - # t_max lower than any incoming T. This way the first T will - # push t_min down to T, and t_max up to T, naturally (without any - # special-casing), and successive times will then ratchet them - # outward as appropriate. - self.t_min = 1L<<32 - self.t_max = 0 - - # This will be set to the SVNCommit that occurs in self._commit. - self.motivating_commit = None - - # This is a list of all non-primary SVNCommits motivated by the - # main commit. We gather these so that we can set their dates to - # the same date as the primary commit. - self.secondary_commits = [ ] - - # State for handling default branches. - # - # Here is a tempting, but ultimately nugatory, bit of logic, which - # I share with you so you may appreciate the less attractive, but - # refreshingly non-nugatory, logic which follows it: - # - # If some of the commits in this txn happened on a non-trunk - # default branch, then those files will have to be copied into - # trunk manually after being changed on the branch (because the - # RCS "default branch" appears as head, i.e., trunk, in practice). - # As long as those copies don't overwrite any trunk paths that - # were also changed in this commit, then we can do the copies in - # the same revision, because they won't cover changes that don't - # appear anywhere/anywhen else. However, if some of the trunk dst - # paths *did* change in this commit, then immediately copying the - # branch changes would lose those trunk mods forever. So in this - # case, we need to do at least that copy in its own revision. And - # for simplicity's sake, if we're creating the new revision for - # even one file, then we just do all such copies together in the - # new revision. - # - # Doesn't that sound nice? - # - # Unfortunately, Subversion doesn't support copies with sources - # in the current txn. All copies must be based in committed - # revisions. Therefore, we generate the above-described new - # revision unconditionally. - # - # This is a list of cvs_revs, and a cvs_rev is appended for each - # default branch commit that will need to be copied to trunk (or - # deleted from trunk) in some generated revision following the - # "regular" revision. - self.default_branch_cvs_revisions = [ ] - - def __str__(self): - """For convenience only. The format is subject to change at any time.""" - - return 'CVSCommit([%s], [%s])' % ( - ', '.join([str(change) for change in self.changes]), - ', '.join([str(delete) for delete in self.deletes]),) - - def __cmp__(self, other): - # Commits should be sorted by t_max. If both self and other have - # the same t_max, break the tie using t_min, and lastly, - # metadata_id. If all those are equal, then compare based on ids, - # to ensure that no two instances compare equal. - return (cmp(self.t_max, other.t_max) - or cmp(self.t_min, other.t_min) - or cmp(self.metadata_id, other.metadata_id) - or cmp(id(self), id(other))) - - def __hash__(self): - return id(self) - - def revisions(self): - return self.changes + self.deletes - - def opens_symbol(self, symbol_id): - """Return True if any CVSRevision in this commit is on a tag or a - branch or is the origin of a tag or branch.""" - - for cvs_rev in self.revisions(): - if cvs_rev.opens_symbol(symbol_id): - return True - return False - - def add_revision(self, cvs_rev): - # Record the time range of this commit. - # - # ### ISSUE: It's possible, though unlikely, that the time range - # of a commit could get gradually expanded to be arbitrarily - # longer than COMMIT_THRESHOLD. I'm not sure this is a huge - # problem, and anyway deciding where to break it up would be a - # judgement call. For now, we just print a warning in commit() if - # this happens. - if cvs_rev.timestamp < self.t_min: - self.t_min = cvs_rev.timestamp - if cvs_rev.timestamp > self.t_max: - self.t_max = cvs_rev.timestamp - - if cvs_rev.op == OP_DELETE: - self.deletes.append(cvs_rev) - else: - # OP_CHANGE or OP_ADD - self.changes.append(cvs_rev) - - def add_dependency(self, dep): - self._deps.add(dep) - - def resolve_dependencies(self): - """Resolve any dependencies that are no longer pending. - Return True iff this commit has no remaining unresolved dependencies.""" - - for dep in list(self._deps): - if dep.pending: - return False - self.t_max = max(self.t_max, dep.t_max + 1) - self._deps.remove(dep) - - return True - - def _pre_commit(self, done_symbols): - """Generate any SVNCommits that must exist before the main commit. - - DONE_SYMBOLS is a set of symbol ids for which the last source - revision has already been seen and for which the - CVSRevisionAggregator has already generated a fill SVNCommit. See - self.process_revisions().""" - - # There may be multiple cvs_revs in this commit that would cause - # branch B to be filled, but we only want to fill B once. On the - # other hand, there might be multiple branches committed on in - # this commit. Whatever the case, we should count exactly one - # commit per branch, because we only fill a branch once per - # CVSCommit. This list tracks which branch_ids we've already - # counted. - accounted_for_symbol_ids = set() - - def fill_needed(cvs_rev): - """Return True iff this is the first commit on a new branch (for - this file) and we need to fill the branch; else return False. - See comments below for the detailed rules.""" - - if not cvs_rev.first_on_branch: - # Only commits that are the first on their branch can force fills: - return False - - pm = Ctx()._persistence_manager - prev_svn_revnum = pm.get_svn_revnum(cvs_rev.prev_id) - - # It should be the case that when we have a file F that is - # added on branch B (thus, F on trunk is in state 'dead'), we - # generate an SVNCommit to fill B iff the branch has never - # been filled before. - if cvs_rev.op == OP_ADD: - # Fill the branch only if it has never been filled before: - return cvs_rev.lod.symbol.id not in pm.last_filled - elif cvs_rev.op == OP_CHANGE: - # We need to fill only if the last commit affecting the file - # has not been filled yet: - return prev_svn_revnum > pm.last_filled.get(cvs_rev.lod.symbol.id, 0) - elif cvs_rev.op == OP_DELETE: - # If the previous revision was also a delete, we don't need - # to fill it - and there's nothing to copy to the branch, so - # we can't anyway. No one seems to know how to get CVS to - # produce the double delete case, but it's been observed. - if Ctx()._cvs_items_db[cvs_rev.prev_id].op == OP_DELETE: - return False - # Other deletes need fills only if the last commit affecting - # the file has not been filled yet: - return prev_svn_revnum > pm.last_filled.get(cvs_rev.lod.symbol.id, 0) - - for cvs_rev in self.changes + self.deletes: - # If a commit is on a branch, we must ensure that the branch - # path being committed exists (in HEAD of the Subversion - # repository). If it doesn't exist, we will need to fill the - # branch. After the fill, the path on which we're committing - # will exist. - if isinstance(cvs_rev.lod, Branch) \ - and cvs_rev.lod.symbol.id not in accounted_for_symbol_ids \ - and cvs_rev.lod.symbol.id not in done_symbols \ - and fill_needed(cvs_rev): - symbol = Ctx()._symbol_db.get_symbol(cvs_rev.lod.symbol.id) - self.secondary_commits.append(SVNPreCommit(symbol)) - accounted_for_symbol_ids.add(cvs_rev.lod.symbol.id) - - def _commit(self): - """Generates the primary SVNCommit that corresponds to this - CVSCommit.""" - - def delete_needed(cvs_rev): - """Return True iff the specified delete CVS_REV is really needed. - - When a file is added on a branch, CVS not only adds the file on - the branch, but generates a trunk revision (typically 1.1) for - that file in state 'dead'. We only want to add this revision if - the log message is not the standard cvs fabricated log message.""" - - if cvs_rev.prev_id is not None: - return True - - # cvs_rev.branch_ids may be empty if the originating branch - # has been excluded. - if not cvs_rev.branch_ids: - return False - # FIXME: This message will not match if the RCS file was renamed - # manually after it was created. - cvs_generated_msg = 'file %s was initially added on branch %s.\n' % ( - cvs_rev.cvs_file.basename, - Ctx()._symbol_db.get_symbol(cvs_rev.branch_ids[0]).name,) - author, log_msg = Ctx()._metadata_db[cvs_rev.metadata_id] - return log_msg != cvs_generated_msg - - # Generate an SVNCommit unconditionally. Even if the only change - # in this CVSCommit is a deletion of an already-deleted file (that - # is, a CVS revision in state 'dead' whose predecessor was also in - # state 'dead'), the conversion will still generate a Subversion - # revision containing the log message for the second dead - # revision, because we don't want to lose that information. - needed_deletes = [ cvs_rev - for cvs_rev in self.deletes - if delete_needed(cvs_rev) - ] - svn_commit = SVNPrimaryCommit(self.changes + needed_deletes) - self.motivating_commit = svn_commit - - for cvs_rev in self.changes: - # Only make a change if we need to: - if cvs_rev.rev == "1.1.1.1" and not cvs_rev.deltatext_exists: - # When 1.1.1.1 has an empty deltatext, the explanation is - # almost always that we're looking at an imported file whose - # 1.1 and 1.1.1.1 are identical. On such imports, CVS creates - # an RCS file where 1.1 has the content, and 1.1.1.1 has an - # empty deltatext, i.e, the same content as 1.1. There's no - # reason to reflect this non-change in the repository, so we - # want to do nothing in this case. (If we were really - # paranoid, we could make sure 1.1's log message is the - # CVS-generated "Initial revision\n", but I think the - # conditions above are strict enough.) - pass - else: - if cvs_rev.default_branch_revision: - self.default_branch_cvs_revisions.append(cvs_rev) - - for cvs_rev in needed_deletes: - if cvs_rev.default_branch_revision: - self.default_branch_cvs_revisions.append(cvs_rev) - - # There is a slight chance that we didn't actually register any - # CVSRevisions with our SVNCommit (see loop over self.deletes - # above), so if we have no CVSRevisions, we don't flush the - # svn_commit to disk and roll back our revnum. - if svn_commit.cvs_revs: - svn_commit.date = self.t_max - Ctx()._persistence_manager.put_svn_commit(svn_commit) - else: - # We will not be flushing this SVNCommit, so rollback the - # SVNCommit revision counter. - SVNCommit.revnum -= 1 - - if not Ctx().trunk_only: - for cvs_rev in self.revisions(): - Ctx()._symbolings_logger.log_revision(cvs_rev, svn_commit.revnum) - - def _post_commit(self): - """Generates any SVNCommits that we can perform now that _commit - has happened. That is, handle non-trunk default branches. - Sometimes an RCS file has a non-trunk default branch, so a commit - on that default branch would be visible in a default CVS checkout - of HEAD. If we don't copy that commit over to Subversion's trunk, - then there will be no Subversion tree which corresponds to that - CVS checkout. Of course, in order to copy the path over, we may - first need to delete the existing trunk there.""" - - # Only generate a commit if we have default branch revs - if self.default_branch_cvs_revisions: - # Generate an SVNCommit for all of our default branch cvs_revs. - svn_commit = SVNPostCommit(self.motivating_commit.revnum, - self.default_branch_cvs_revisions) - for cvs_rev in self.default_branch_cvs_revisions: - Ctx()._symbolings_logger.log_default_branch_closing( - cvs_rev, svn_commit.revnum) - self.secondary_commits.append(svn_commit) - - def process_revisions(self, done_symbols): - """Process all the CVSRevisions that this instance has, creating - one or more SVNCommits in the process. Generate fill SVNCommits - only for symbols not in DONE_SYMBOLS (avoids unnecessary - fills). - - Return the primary SVNCommit that corresponds to this CVSCommit. - The returned SVNCommit is the commit that motivated any other - SVNCommits generated in this CVSCommit.""" - - seconds = self.t_max - self.t_min + 1 - - Log().verbose('-' * 60) - Log().verbose('CVS Revision grouping:') - if seconds == 1: - Log().verbose(' Start time: %s (duration: 1 second)' - % time.ctime(self.t_max)) - else: - Log().verbose(' Start time: %s' % time.ctime(self.t_min)) - Log().verbose(' End time: %s (duration: %d seconds)' - % (time.ctime(self.t_max), seconds)) - - if seconds > config.COMMIT_THRESHOLD + 1: - Log().warn('%s: grouping spans more than %d seconds' - % (warning_prefix, config.COMMIT_THRESHOLD)) - - if Ctx().trunk_only: - # When trunk-only, only do the primary commit: - self._commit() - else: - self._pre_commit(done_symbols) - self._commit() - self._post_commit() - - for svn_commit in self.secondary_commits: - svn_commit.date = self.motivating_commit.date - Ctx()._persistence_manager.put_svn_commit(svn_commit) - - return self.motivating_commit - - diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/cvs_file.py cvs2svn-2.0.0/cvs2svn_lib/cvs_file.py --- cvs2svn-1.5.x/cvs2svn_lib/cvs_file.py 2006-08-10 16:59:05.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/cvs_file.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -19,6 +19,7 @@ import os from cvs2svn_lib.boolean import * +from cvs2svn_lib.common import path_split from cvs2svn_lib.key_generator import KeyGenerator from cvs2svn_lib.context import Ctx @@ -29,21 +30,25 @@ class CVSFile(object): key_generator = KeyGenerator(1) def __init__(self, id, project, filename, cvs_path, - in_attic, executable, file_size, mode): + executable, file_size, mode): """Initialize a new CVSFile object. Arguments: - ID --> (int or None) unique id for this file. If None, - a new id is generated. + ID --> (int or None) unique id for this file. If None, a new + id is generated. PROJECT --> (Project) the project containing this file FILENAME --> (string) the filesystem path to the CVS file - CVS_PATH --> (string) the canonical path within the CVS - project (no 'Attic', no ',v', forward slashes) - IN_ATTIC --> (bool) True iff RCS file is in Attic + CVS_PATH --> (string) the canonical path within the CVS project (no + 'Attic', no ',v', forward slashes) EXECUTABLE --> (bool) True iff RCS file has executable bit set FILE_SIZE --> (long) size of the RCS file in bytes - MODE --> (string or None) 'kkv', 'kb', etc.""" + MODE --> (string or None) 'kkv', 'kb', etc. + + CVS_PATH might contain an 'Attic' component if it should be + retained in the SVN repository; i.e., if the same filename exists + out of Attic and the --retain-conflicting-attic-files option was + specified.""" if id is None: self.id = self.key_generator.gen_id() @@ -53,24 +58,23 @@ class CVSFile(object): self.project = project self.filename = filename self.cvs_path = cvs_path - self.in_attic = in_attic self.executable = executable self.file_size = file_size self.mode = mode def __getstate__(self): return (self.id, self.project.id, self.filename, self.cvs_path, - self.in_attic, self.executable, self.file_size, self.mode,) + self.executable, self.file_size, self.mode,) def __setstate__(self, state): (self.id, project_id, self.filename, self.cvs_path, - self.in_attic, self.executable, self.file_size, self.mode,) = state + self.executable, self.file_size, self.mode,) = state self.project = Ctx().projects[project_id] def get_basename(self): - """Return the last path component of self.filename, minus the ',v'.""" + """Return the last path component of self.cvs_path.""" - return os.path.basename(self.filename)[:-2] + return path_split(self.cvs_path)[1] basename = property(get_basename) diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/cvs_file_database.py cvs2svn-2.0.0/cvs2svn_lib/cvs_file_database.py --- cvs2svn-1.5.x/cvs2svn_lib/cvs_file_database.py 2006-05-25 17:44:59.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/cvs_file_database.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -17,10 +17,15 @@ """This module contains database facilities used by cvs2svn.""" +import cPickle + from cvs2svn_lib.boolean import * from cvs2svn_lib import config +from cvs2svn_lib.common import DB_OPEN_READ +from cvs2svn_lib.common import DB_OPEN_WRITE +from cvs2svn_lib.common import DB_OPEN_NEW +from cvs2svn_lib.log import Log from cvs2svn_lib.artifact_manager import artifact_manager -from cvs2svn_lib.database import PDatabase class CVSFileDatabase: @@ -30,17 +35,36 @@ class CVSFileDatabase: """Initialize an instance, opening database in MODE (like the MODE argument to Database or anydbm.open()).""" - self.db = PDatabase(artifact_manager.get_temp_file(config.CVS_FILES_DB), - mode) + self.mode = mode + + if self.mode == DB_OPEN_NEW: + # A map { id : CVSFile } + self._cvs_files = {} + elif self.mode == DB_OPEN_READ: + f = open(artifact_manager.get_temp_file(config.CVS_FILES_DB), 'rb') + self._cvs_files = cPickle.load(f) + else: + raise RuntimeError('Invalid mode %r' % self.mode) def log_file(self, cvs_file): """Add CVS_FILE, a CVSFile instance, to the database.""" - self.db['%x' % cvs_file.id] = cvs_file + if self.mode == DB_OPEN_READ: + raise RuntimeError('Cannot write items in mode %r' % self.mode) + + self._cvs_files[cvs_file.id] = cvs_file def get_file(self, id): """Return the CVSFile with the specified ID.""" - return self.db['%x' % id] + return self._cvs_files[id] + + def close(self): + if self.mode == DB_OPEN_NEW: + f = open(artifact_manager.get_temp_file(config.CVS_FILES_DB), 'wb') + cPickle.dump(self._cvs_files, f, -1) + f.close() + + self._cvs_files = None diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/cvs_file_items.py cvs2svn-2.0.0/cvs2svn_lib/cvs_file_items.py --- cvs2svn-1.5.x/cvs2svn_lib/cvs_file_items.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/cvs_file_items.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,826 @@ +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2006-2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""This module contains a class to manage the CVSItems related to one file.""" + + +from __future__ import generators + +import re + +from cvs2svn_lib.boolean import * +from cvs2svn_lib.set_support import * +from cvs2svn_lib.common import InternalError +from cvs2svn_lib.common import FatalError +from cvs2svn_lib.context import Ctx +from cvs2svn_lib.log import Log +from cvs2svn_lib.symbol import Trunk +from cvs2svn_lib.symbol import Branch +from cvs2svn_lib.symbol import Tag +from cvs2svn_lib.symbol import ExcludedSymbol +from cvs2svn_lib.cvs_item import CVSRevision +from cvs2svn_lib.cvs_item import CVSRevisionModification +from cvs2svn_lib.cvs_item import CVSRevisionAbsent +from cvs2svn_lib.cvs_item import CVSRevisionNoop +from cvs2svn_lib.cvs_item import CVSSymbol +from cvs2svn_lib.cvs_item import CVSBranch +from cvs2svn_lib.cvs_item import CVSTag +from cvs2svn_lib.cvs_item import cvs_revision_type_map +from cvs2svn_lib.cvs_item import cvs_branch_type_map +from cvs2svn_lib.cvs_item import cvs_tag_type_map + + +class LODItems(object): + def __init__(self, lod, cvs_branch, cvs_revisions, cvs_branches, cvs_tags): + # The LineOfDevelopment described by this instance. + self.lod = lod + + # The CVSBranch starting this LOD, if any; otherwise, None. + self.cvs_branch = cvs_branch + + # The list of CVSRevisions on this LOD, if any. The CVSRevisions + # are listed in dependency order. + self.cvs_revisions = cvs_revisions + + # A list of CVSBranches that sprout from this LOD (either from + # cvs_branch or from one of the CVSRevisions). + self.cvs_branches = cvs_branches + + # A list of CVSTags that sprout from this LOD (either from + # cvs_branch or from one of the CVSRevisions). + self.cvs_tags = cvs_tags + + +class CVSFileItems(object): + def __init__(self, cvs_file, trunk, cvs_items): + # The file whose data this instance holds. + self.cvs_file = cvs_file + + # The symbol that represents "Trunk" in this file. + self.trunk = trunk + + # A map from CVSItem.id to CVSItem: + self._cvs_items = {} + + # The cvs_item_id of each root in the CVSItem forest. (A root is + # defined to be any CVSRevision with no prev_id.) + self.root_ids = set() + + for cvs_item in cvs_items: + self.add(cvs_item) + if isinstance(cvs_item, CVSRevision) and cvs_item.prev_id is None: + self.root_ids.add(cvs_item.id) + + def __getstate__(self): + return (self.cvs_file.id, self.trunk.id, self.values(),) + + def __setstate__(self, state): + (cvs_file_id, trunk_id, cvs_items,) = state + CVSFileItems.__init__( + self, Ctx()._cvs_file_db.get_file(cvs_file_id), + Ctx()._symbol_db.get_symbol(trunk_id), cvs_items, + ) + + def add(self, cvs_item): + self._cvs_items[cvs_item.id] = cvs_item + + def __getitem__(self, id): + """Return the CVSItem with the specified ID.""" + + return self._cvs_items[id] + + def __delitem__(self, id): + assert id not in self.root_ids + del self._cvs_items[id] + + def values(self): + return self._cvs_items.values() + + def _iter_tree(self, lod, cvs_branch, start_id): + """Iterate over the tree that starts at the specified line of development. + + LOD is the LineOfDevelopment where the iteration should start. + CVS_BRANCH is the CVSBranch instance that starts the LOD if any; + otherwise it is None. ID is the id of the first CVSRevision on + this LOD, or None if there are none. + + There are two cases handled by this routine: trunk (where LOD is a + Trunk instance, CVS_BRANCH is None, and ID is the id of the 1.1 + revision) and a branch (where LOD is a Branch instance, CVS_BRANCH + is a CVSBranch instance, and ID is either the id of the first + CVSRevision on the branch or None if there are no CVSRevisions on + the branch). Note that CVS_BRANCH and ID cannot simultaneously be + None. + + Yield an LODItems instance for each line of development.""" + + cvs_revisions = [] + cvs_branches = [] + cvs_tags = [] + + def process_subitems(cvs_item): + """Process the branches and tags that are rooted in CVS_ITEM. + + CVS_ITEM can be a CVSRevision or a CVSBranch.""" + + for branch_id in cvs_item.branch_ids[:]: + # Recurse into the branch: + branch = self[branch_id] + for lod_items in self._iter_tree( + branch.symbol, branch, branch.next_id + ): + yield lod_items + # The caller might have deleted the branch that we just + # yielded. If it is no longer present, then do not add it to + # the list of cvs_branches. + try: + cvs_branches.append(self[branch_id]) + except KeyError: + pass + + for tag_id in cvs_item.tag_ids: + cvs_tags.append(self[tag_id]) + + if cvs_branch is not None: + # Include the symbols sprouting directly from the CVSBranch: + for lod_items in process_subitems(cvs_branch): + yield lod_items + + id = start_id + while id is not None: + cvs_rev = self[id] + cvs_revisions.append(cvs_rev) + + for lod_items in process_subitems(cvs_rev): + yield lod_items + + id = cvs_rev.next_id + + yield LODItems(lod, cvs_branch, cvs_revisions, cvs_branches, cvs_tags) + + def iter_lods(self): + """Iterate over LinesOfDevelopment in this file, in depth-first order. + + For each LOD, yield an LODItems instance. The traversal starts at + each root node but returns the LODs in depth-first order. + + It is allowed to modify the CVSFileItems instance while the + traversal is occurring, but only in ways that don't affect the + tree structure above (i.e., towards the trunk from) the current + LOD.""" + + # Make a list out of root_ids so that callers can change it: + for id in list(self.root_ids): + cvs_item = self[id] + if isinstance(cvs_item, CVSRevision): + # This LOD doesn't have a CVSBranch associated with it. + # Either it is Trunk, or it is a branch whose CVSBranch has + # been deleted. + lod = cvs_item.lod + cvs_branch = None + elif isinstance(cvs_item, CVSBranch): + # This is a Branch that has been severed from the rest of the + # tree. + lod = cvs_item.symbol + id = cvs_item.next_id + cvs_branch = cvs_item + else: + raise InternalError('Unexpected root item: %s' % (cvs_item,)) + + for lod_items in self._iter_tree(lod, cvs_branch, id): + yield lod_items + + def adjust_ntdbrs(self, file_imported, ntdbr_ids, rev_1_2_id): + """Adjust the non-trunk default branch revisions listed in NTDBR_IDS. + + FILE_IMPORTED is a boolean indicating whether this file appears to + have been imported, which also means that revision 1.1 has a + generated log message that need not be preserved. NTDBR_IDS is a + list of cvs_rev_ids for the revisions that have been determined to + be non-trunk default branch revisions. + + The first revision on the default branch is handled strangely by + CVS. If a file is imported (as opposed to being added), CVS + creates a 1.1 revision, then creates a vendor branch 1.1.1 based + on 1.1, then creates a 1.1.1.1 revision that is identical to the + 1.1 revision (i.e., its deltatext is empty). The log message that + the user typed when importing is stored with the 1.1.1.1 revision. + The 1.1 revision always contains a standard, generated log + message, 'Initial revision\n'. + + When we detect a straightforward import like this, we want to + handle it by deleting the 1.1 revision (which doesn't contain any + useful information) and making 1.1.1.1 into an independent root in + the file's dependency tree. In SVN, 1.1.1.1 will be added + directly to the vendor branch with its initial content. Then in a + special 'post-commit', the 1.1.1.1 revision is copied back to + trunk. + + If the user imports again to the same vendor branch, then CVS + creates revisions 1.1.1.2, 1.1.1.3, etc. on the vendor branch, + *without* counterparts in trunk (even though these revisions + effectively play the role of trunk revisions). So after we add + such revisions to the vendor branch, we also copy them back to + trunk in post-commits. + + Set the default_branch_revision members of the revisions listed in + NTDBR_IDS to True. Also, if REV_1_2_ID is not None, then it is + the id of revision 1.2. Set that revision to depend on the last + non-trunk default branch revision and possibly adjust its type + accordingly.""" + + cvs_rev = self[ntdbr_ids[0]] + + if file_imported \ + and cvs_rev.rev == '1.1.1.1' \ + and isinstance(cvs_rev, CVSRevisionModification) \ + and not cvs_rev.deltatext_exists: + rev_1_1 = self[cvs_rev.prev_id] + Log().debug('Removing unnecessary revision %s' % (rev_1_1,)) + + # Delete rev_1_1: + self.root_ids.remove(rev_1_1.id) + del self[rev_1_1.id] + cvs_rev.prev_id = None + if rev_1_2_id is not None: + rev_1_2 = self[rev_1_2_id] + rev_1_2.prev_id = None + self.root_ids.add(rev_1_2.id) + + # Delete the 1.1.1 CVSBranch: + assert cvs_rev.first_on_branch_id is not None + cvs_branch = self[cvs_rev.first_on_branch_id] + if cvs_branch.source_id == rev_1_1.id: + del self[cvs_branch.id] + rev_1_1.branch_ids.remove(cvs_branch.id) + rev_1_1.branch_commit_ids.remove(cvs_rev.id) + cvs_rev.first_on_branch_id = None + self.root_ids.add(cvs_rev.id) + + # Change the type of cvs_rev (typically from Change to Add): + cvs_rev.__class__ = cvs_revision_type_map[( + isinstance(cvs_rev, CVSRevisionModification), + False, + )] + + # Move any tags and branches from rev_1_1 to cvs_rev: + cvs_rev.tag_ids.extend(rev_1_1.tag_ids) + for id in rev_1_1.tag_ids: + cvs_tag = self[id] + cvs_tag.source_lod = cvs_rev.lod + cvs_tag.source_id = cvs_rev.id + cvs_rev.branch_ids[0:0] = rev_1_1.branch_ids + for id in rev_1_1.branch_ids: + cvs_branch = self[id] + cvs_branch.source_lod = cvs_rev.lod + cvs_branch.source_id = cvs_rev.id + cvs_rev.branch_commit_ids[0:0] = rev_1_1.branch_commit_ids + for id in rev_1_1.branch_commit_ids: + cvs_rev2 = self[id] + cvs_rev2.prev_id = cvs_rev.id + + for cvs_rev_id in ntdbr_ids: + cvs_rev = self[cvs_rev_id] + cvs_rev.default_branch_revision = True + + if rev_1_2_id is not None: + # Revision 1.2 logically follows the imported revisions, not + # 1.1. Accordingly, connect it to the last NTDBR and possibly + # change its type. + rev_1_2 = self[rev_1_2_id] + last_ntdbr = self[ntdbr_ids[-1]] + rev_1_2.default_branch_prev_id = last_ntdbr.id + last_ntdbr.default_branch_next_id = rev_1_2.id + rev_1_2.__class__ = cvs_revision_type_map[( + isinstance(rev_1_2, CVSRevisionModification), + isinstance(last_ntdbr, CVSRevisionModification), + )] + + def _delete_unneeded(self, cvs_item, metadata_db): + if isinstance(cvs_item, CVSRevisionNoop) \ + and cvs_item.rev == '1.1' \ + and isinstance(cvs_item.lod, Trunk) \ + and len(cvs_item.branch_ids) >= 1 \ + and self[cvs_item.branch_ids[0]].next_id is not None \ + and not cvs_item.closed_symbols \ + and not cvs_item.default_branch_revision: + # FIXME: This message will not match if the RCS file was renamed + # manually after it was created. + author, log_msg = metadata_db[cvs_item.metadata_id] + cvs_generated_msg = 'file %s was initially added on branch %s.\n' % ( + self.cvs_file.basename, + self[cvs_item.branch_ids[0]].symbol.name,) + return log_msg == cvs_generated_msg + else: + return False + + def remove_unneeded_deletes(self, metadata_db): + """Remove unneeded deletes for this file. + + If a file is added on a branch, then a trunk revision is added at + the same time in the 'Dead' state. This revision doesn't do + anything useful, so delete it.""" + + for id in self.root_ids: + cvs_item = self[id] + if self._delete_unneeded(cvs_item, metadata_db): + Log().debug('Removing unnecessary delete %s' % (cvs_item,)) + + # Delete cvs_item: + self.root_ids.remove(cvs_item.id) + del self[id] + if cvs_item.next_id is not None: + cvs_rev_next = self[cvs_item.next_id] + cvs_rev_next.prev_id = None + self.root_ids.add(cvs_rev_next.id) + + # Delete all CVSBranches rooted at this revision. If there is + # a CVSRevision on the branch, it should already be an add so + # it doesn't have to be changed. + for cvs_branch_id in cvs_item.branch_ids: + cvs_branch = self[cvs_branch_id] + del self[cvs_branch.id] + + if cvs_branch.next_id is not None: + cvs_branch_next = self[cvs_branch.next_id] + cvs_branch_next.first_on_branch_id = None + cvs_branch_next.prev_id = None + self.root_ids.add(cvs_branch_next.id) + + # Tagging a dead revision doesn't do anything, so remove any + # tags that were set on 1.1: + for cvs_tag_id in cvs_item.tag_ids: + del self[cvs_tag_id] + + # This can only happen once per file, and we might have just + # changed self.root_ids, so break out of the loop: + break + + def _initial_branch_delete_unneeded(self, lod_items, metadata_db): + """Return True iff the initial revision in LOD_ITEMS can be deleted.""" + + if lod_items.cvs_branch is not None \ + and lod_items.cvs_branch.source_id is not None \ + and len(lod_items.cvs_revisions) >= 2: + cvs_revision = lod_items.cvs_revisions[0] + cvs_rev_source = self[lod_items.cvs_branch.source_id] + if isinstance(cvs_revision, CVSRevisionAbsent) \ + and not cvs_revision.tag_ids \ + and not cvs_revision.branch_ids \ + and abs(cvs_revision.timestamp - cvs_rev_source.timestamp) <= 2: + # FIXME: This message will not match if the RCS file was renamed + # manually after it was created. + author, log_msg = metadata_db[cvs_revision.metadata_id] + return bool(re.match( + r'file %s was added on branch .* on ' + r'\d{4}\-\d{2}\-\d{2} \d{2}\:\d{2}\:\d{2}( [\+\-]\d{4})?' + '\n' % (re.escape(self.cvs_file.basename),), + log_msg, + )) + return False + + def remove_initial_branch_deletes(self, metadata_db): + """If the first revision on a branch is an unnecessary delete, remove it. + + If a file is added on a branch (whether or not it already existed + on trunk), then new versions of CVS add a first branch revision in + the 'dead' state (to indicate that the file did not exist on the + branch when the branch was created) followed by the second branch + revision, which is an add. When we encounter this situation, we + sever the branch from trunk and delete the first branch + revision.""" + + for lod_items in self.iter_lods(): + if self._initial_branch_delete_unneeded(lod_items, metadata_db): + cvs_revision = lod_items.cvs_revisions[0] + Log().debug( + 'Removing unnecessary initial branch delete %s' % (cvs_revision,) + ) + cvs_branch = lod_items.cvs_branch + cvs_rev_source = self[cvs_branch.source_id] + cvs_rev_next = lod_items.cvs_revisions[1] + + # Delete cvs_revision: + del self[cvs_revision.id] + cvs_rev_next.prev_id = None + self.root_ids.add(cvs_rev_next.id) + cvs_rev_source.branch_commit_ids.remove(cvs_revision.id) + + # Delete the CVSBranch on which it is located: + del self[cvs_branch.id] + cvs_rev_source.branch_ids.remove(cvs_branch.id) + + def _exclude_tag(self, cvs_tag): + """Exclude the specified CVS_TAG.""" + + del self[cvs_tag.id] + + # A CVSTag is the successor of the CVSRevision that it + # sprouts from. Delete this tag from that revision's + # tag_ids: + self[cvs_tag.source_id].tag_ids.remove(cvs_tag.id) + + def _exclude_branch(self, lod_items): + """Exclude the branch described by LOD_ITEMS, including its revisions. + + (Do not update the LOD_ITEMS instance itself.) + + If the LOD starts with non-trunk default branch revisions, leave + them in place and do not delete the branch. In this case, return + True; otherwise return False""" + + if lod_items.cvs_revisions \ + and lod_items.cvs_revisions[0].default_branch_revision: + for cvs_rev in lod_items.cvs_revisions: + if not cvs_rev.default_branch_revision: + # We've found the first non-NTDBR, and it's stored in cvs_rev: + break + else: + # There was no revision following the NTDBRs: + cvs_rev = None + + if cvs_rev: + last_ntdbr = self[cvs_rev.prev_id] + last_ntdbr.next_id = None + while True: + del self[cvs_rev.id] + if cvs_rev.next_id is None: + break + cvs_rev = self[cvs_rev.next_id] + + return True + + else: + if lod_items.cvs_branch is not None: + # Delete the CVSBranch itself: + cvs_branch = lod_items.cvs_branch + + del self[cvs_branch.id] + + # A CVSBranch is the successor of the CVSRevision that it + # sprouts from. Delete this branch from that revision's + # branch_ids: + self[cvs_branch.source_id].branch_ids.remove(cvs_branch.id) + + if lod_items.cvs_revisions: + # The first CVSRevision on the branch has to be either detached + # from the revision from which the branch sprang, or removed + # from self.root_ids: + cvs_rev = lod_items.cvs_revisions[0] + if cvs_rev.prev_id is None: + self.root_ids.remove(cvs_rev.id) + else: + self[cvs_rev.prev_id].branch_commit_ids.remove(cvs_rev.id) + + for cvs_rev in lod_items.cvs_revisions: + del self[cvs_rev.id] + # If cvs_rev is the last default revision on a non-trunk + # default branch followed by a 1.2 revision, then the 1.2 + # revision depends on this one. FIXME: It is questionable + # whether this handling is correct, since the non-trunk + # default branch revisions affect trunk and should therefore + # not just be discarded even if --trunk-only. + if cvs_rev.default_branch_next_id is not None: + next = self[cvs_rev.default_branch_next_id] + assert next.default_branch_prev_id == cvs_rev.id + next.default_branch_prev_id = None + if next.prev_id is None: + self.root_ids.add(next.id) + + return False + + def graft_ntdbr_to_trunk(self): + """Graft the non-trunk default branch revisions to trunk. + + They should already be alone on a CVSBranch-less branch.""" + + ntdbr_lod_items = None + for lod_items in self.iter_lods(): + if lod_items.cvs_revisions \ + and lod_items.cvs_revisions[0].default_branch_revision: + assert lod_items.cvs_branch is None + assert not lod_items.cvs_branches + assert not lod_items.cvs_tags + + last_rev = lod_items.cvs_revisions[-1] + + if last_rev.default_branch_next_id is not None: + rev_1_2 = self[last_rev.default_branch_next_id] + rev_1_2.default_branch_prev_id = None + rev_1_2.prev_id = last_rev.id + self.root_ids.remove(rev_1_2.id) + last_rev.default_branch_next_id = None + last_rev.next_id = rev_1_2.id + # The type of rev_1_2 was already adjusted in + # adjust_ntdbrs(), so we don't have to change its type here. + + for cvs_rev in lod_items.cvs_revisions: + cvs_rev.default_branch_revision = False + cvs_rev.lod = self.trunk + + for cvs_branch in lod_items.cvs_branches: + cvs_branch.source_lod = self.trunk + + for cvs_tag in lod_items.cvs_tags: + cvs_tag.source_lod = self.trunk + + return + + def exclude_non_trunk(self): + """Delete all tags and branches.""" + + ntdbr_excluded = False + for lod_items in self.iter_lods(): + for cvs_tag in lod_items.cvs_tags[:]: + self._exclude_tag(cvs_tag) + lod_items.cvs_tags.remove(cvs_tag) + + assert not lod_items.cvs_branches + assert not lod_items.cvs_tags + + if not isinstance(lod_items.lod, Trunk): + ntdbr_excluded |= self._exclude_branch(lod_items) + + if ntdbr_excluded: + self.graft_ntdbr_to_trunk() + + def filter_excluded_symbols(self, revision_excluder): + """Delete any excluded symbols and references to them. + + Call the revision_excluder's callback methods to let it know what + is being excluded.""" + + revision_excluder_started = False + ntdbr_excluded = False + for lod_items in self.iter_lods(): + # Delete any excluded tags: + for cvs_tag in lod_items.cvs_tags[:]: + if isinstance(cvs_tag.symbol, ExcludedSymbol): + revision_excluder_started = True + + self._exclude_tag(cvs_tag) + + lod_items.cvs_tags.remove(cvs_tag) + + # Delete the whole branch if it is to be excluded: + if isinstance(lod_items.lod, ExcludedSymbol): + # A symbol can only be excluded if no other symbols spring + # from it. This was already checked in CollateSymbolsPass, so + # these conditions should already be satisfied. + assert not lod_items.cvs_branches + assert not lod_items.cvs_tags + + revision_excluder_started = True + + ntdbr_excluded |= self._exclude_branch(lod_items) + + if ntdbr_excluded: + self.graft_ntdbr_to_trunk() + + if revision_excluder_started: + revision_excluder.process_file(self) + else: + revision_excluder.skip_file(self.cvs_file) + + def _mutate_branch_to_tag(self, cvs_branch): + """Mutate the branch CVS_BRANCH into a tag.""" + + if cvs_branch.next_id is not None: + # This shouldn't happen because it was checked in + # CollateSymbolsPass: + raise FatalError('Attempt to exclude a branch with commits.') + cvs_tag = CVSTag( + cvs_branch.id, cvs_branch.cvs_file, cvs_branch.symbol, + cvs_branch.source_lod, cvs_branch.source_id) + self.add(cvs_tag) + cvs_revision = self[cvs_tag.source_id] + cvs_revision.branch_ids.remove(cvs_tag.id) + cvs_revision.tag_ids.append(cvs_tag.id) + + def _mutate_tag_to_branch(self, cvs_tag): + """Mutate the tag into a branch.""" + + cvs_branch = CVSBranch( + cvs_tag.id, cvs_tag.cvs_file, cvs_tag.symbol, + None, cvs_tag.source_lod, cvs_tag.source_id, None) + self.add(cvs_branch) + cvs_revision = self[cvs_branch.source_id] + cvs_revision.tag_ids.remove(cvs_branch.id) + cvs_revision.branch_ids.append(cvs_branch.id) + + def _mutate_symbol(self, cvs_symbol): + """Mutate CVS_SYMBOL if necessary.""" + + symbol = cvs_symbol.symbol + if isinstance(cvs_symbol, CVSBranch) and isinstance(symbol, Tag): + self._mutate_branch_to_tag(cvs_symbol) + elif isinstance(cvs_symbol, CVSTag) and isinstance(symbol, Branch): + self._mutate_tag_to_branch(cvs_symbol) + + def mutate_symbols(self): + """Force symbols to be tags/branches based on self.symbol_db.""" + + for cvs_item in self.values(): + if isinstance(cvs_item, CVSRevision): + # This CVSRevision may be affected by the mutation of any + # CVSSymbols that it references, but there is nothing to do + # here directly. + pass + elif isinstance(cvs_item, CVSSymbol): + self._mutate_symbol(cvs_item) + else: + raise RuntimeError('Unknown cvs item type') + + def _adjust_tag_parent(self, cvs_tag): + """Adjust the parent of CVS_TAG if possible and preferred. + + CVS_TAG is an instance of CVSTag. This method must be called in + leaf-to-trunk order.""" + + # The Symbol that cvs_tag would like to have as a parent: + preferred_parent = Ctx()._symbol_db.get_symbol( + cvs_tag.symbol.preferred_parent_id) + + if cvs_tag.source_lod == preferred_parent: + # The preferred parent is already the parent. + return + + # The CVSRevision that is its direct parent: + source = self[cvs_tag.source_id] + assert isinstance(source, CVSRevision) + + if isinstance(preferred_parent, Trunk): + # It is not possible to graft *onto* Trunk: + return + + # Try to find the preferred parent among the possible parents: + for branch_id in source.branch_ids: + if self[branch_id].symbol == preferred_parent: + # We found it! + break + else: + # The preferred parent is not a possible parent in this file. + return + + parent = self[branch_id] + assert isinstance(parent, CVSBranch) + + Log().debug('Grafting %s from %s (on %s) onto %s' % ( + cvs_tag, source, source.lod, parent,)) + # Switch parent: + source.tag_ids.remove(cvs_tag.id) + parent.tag_ids.append(cvs_tag.id) + cvs_tag.source_lod = parent.symbol + cvs_tag.source_id = parent.id + + def _adjust_branch_parents(self, cvs_branch): + """Adjust the parent of CVS_BRANCH if possible and preferred. + + CVS_BRANCH is an instance of CVSBranch. This method must be + called in leaf-to-trunk order.""" + + # The Symbol that cvs_branch would like to have as a parent: + preferred_parent = Ctx()._symbol_db.get_symbol( + cvs_branch.symbol.preferred_parent_id) + + if cvs_branch.source_lod == preferred_parent: + # The preferred parent is already the parent. + return + + # The CVSRevision that is its direct parent: + source = self[cvs_branch.source_id] + # This is always a CVSRevision because we haven't adjusted it yet: + assert isinstance(source, CVSRevision) + + if isinstance(preferred_parent, Trunk): + # It is not possible to graft *onto* Trunk: + return + + # Try to find the preferred parent among the possible parents: + for branch_id in source.branch_ids: + possible_parent = self[branch_id] + if possible_parent.symbol == preferred_parent: + # We found it! + break + elif possible_parent.symbol == cvs_branch.symbol: + # Only branches that precede the branch to be adjusted are + # considered possible parents. Leave parentage unchanged: + return + else: + # This point should never be reached. + raise InternalError( + 'Possible parent search did not terminate as expected') + + parent = possible_parent + assert isinstance(parent, CVSBranch) + + Log().debug('Grafting %s from %s (on %s) onto %s' % ( + cvs_branch, source, source.lod, parent,)) + # Switch parent: + source.branch_ids.remove(cvs_branch.id) + parent.branch_ids.append(cvs_branch.id) + cvs_branch.source_lod = parent.symbol + cvs_branch.source_id = parent.id + + def adjust_parents(self): + """Adjust the parents of symbols to their preferred parents. + + If a CVSSymbol has a preferred parent that is different than its + current parent, and if the preferred parent is an allowed parent + of the CVSSymbol in this file, then graft the CVSSymbol onto its + preferred parent.""" + + for lod_items in self.iter_lods(): + for cvs_tag in lod_items.cvs_tags: + self._adjust_tag_parent(cvs_tag) + + for cvs_branch in lod_items.cvs_branches: + self._adjust_branch_parents(cvs_branch) + + def _get_revision_source(self, cvs_symbol): + """Return the CVSRevision that is the ultimate source of CVS_SYMBOL.""" + + while True: + cvs_item = self[cvs_symbol.source_id] + if isinstance(cvs_item, CVSRevision): + return cvs_item + else: + cvs_symbol = cvs_item + + def refine_symbols(self): + """Refine the types of the CVSSymbols in this file. + + Adjust the symbol types based on whether the source exists: + CVSBranch vs. CVSBranchNoop and CVSTag vs. CVSTagNoop.""" + + for lod_items in self.iter_lods(): + for cvs_tag in lod_items.cvs_tags: + source = self._get_revision_source(cvs_tag) + cvs_tag.__class__ = cvs_tag_type_map[ + isinstance(source, CVSRevisionModification) + ] + + for cvs_branch in lod_items.cvs_branches: + source = self._get_revision_source(cvs_branch) + cvs_branch.__class__ = cvs_branch_type_map[ + isinstance(source, CVSRevisionModification) + ] + + def record_opened_symbols(self): + """Set CVSRevision.opened_symbols for the surviving revisions.""" + + for cvs_item in self.values(): + if isinstance(cvs_item, (CVSRevision, CVSBranch)): + cvs_item.opened_symbols = [] + for cvs_symbol_opened_id in cvs_item.get_cvs_symbol_ids_opened(): + cvs_symbol_opened = self[cvs_symbol_opened_id] + cvs_item.opened_symbols.append( + (cvs_symbol_opened.symbol.id, cvs_symbol_opened.id,) + ) + + def record_closed_symbols(self): + """Set CVSRevision.closed_symbols for the surviving revisions. + + A CVSRevision closes the symbols that were opened by the CVSItems + that the CVSRevision closes. Got it? + + This method must be called after record_opened_symbols().""" + + for cvs_item in self.values(): + if isinstance(cvs_item, CVSRevision): + cvs_item.closed_symbols = [] + for cvs_item_closed_id in cvs_item.get_ids_closed(): + cvs_item_closed = self[cvs_item_closed_id] + cvs_item.closed_symbols.extend(cvs_item_closed.opened_symbols) + + def check_symbol_parent_lods(self): + """Do a consistency check that CVSSymbol.source_lod is set correctly.""" + + for cvs_item in self.values(): + if isinstance(cvs_item, CVSSymbol): + source = self[cvs_item.source_id] + if isinstance(source, CVSRevision): + source_lod = source.lod + else: + source_lod = source.symbol + + if cvs_item.source_lod != source_lod: + raise FatalError( + 'source_lod discrepancy for %r: %s != %s' + % (cvs_item, cvs_item.source_lod, source_lod,) + ) + + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/cvs_item.py cvs2svn-2.0.0/cvs2svn_lib/cvs_item.py --- cvs2svn-1.5.x/cvs2svn_lib/cvs_item.py 2006-09-17 23:30:26.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/cvs_item.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -14,14 +14,48 @@ # history and logs, available at http://cvs2svn.tigris.org/. # ==================================================================== -"""This module contains classes to store CVS atomic items.""" +"""This module contains classes to store atomic CVS events. +A CVSItem is a single event, pertaining to a single file, that can be +determined to have occured based on the information in the CVS +repository. + +The inheritance tree is as follows: + +CVSItem +| ++--CVSRevision +| | +| +--CVSRevisionModification (* -> 'Exp') +| | | +| | +--CVSRevisionAdd ('dead' -> 'Exp') +| | | +| | +--CVSRevisionChange ('Exp' -> 'Exp') +| | +| +--CVSRevisionAbsent (* -> 'dead') +| | +| +--CVSRevisionDelete ('Exp' -> 'dead') +| | +| +--CVSRevisionNoop ('dead' -> 'dead') +| ++--CVSSymbol + | + +--CVSBranch + | | + | +--CVSBranchNoop + | + +--CVSTag + | + +--CVSTagNoop + +""" + + +from __future__ import generators from cvs2svn_lib.boolean import * -from cvs2svn_lib.common import OP_DELETE +from cvs2svn_lib.set_support import * from cvs2svn_lib.context import Ctx -from cvs2svn_lib.line_of_development import Trunk -from cvs2svn_lib.line_of_development import Branch class CVSItem(object): @@ -29,64 +63,141 @@ class CVSItem(object): self.id = id self.cvs_file = cvs_file + def __eq__(self, other): + return self.id == other.id + + def __cmp__(self, other): + return cmp(self.id, other.id) + + def __hash__(self): + return self.id + def __getstate__(self): raise NotImplementedError() def __setstate__(self, data): raise NotImplementedError() + def get_svn_path(self): + """Return the SVN path associated with this CVSItem.""" + + raise NotImplementedError() + + def get_pred_ids(self): + """Return the CVSItem.ids of direct predecessors of SELF. + + A predecessor is defined to be a CVSItem that has to have been + committed before this one.""" + + raise NotImplementedError() + + def get_succ_ids(self): + """Return the CVSItem.ids of direct successors of SELF. + + A direct successor is defined to be a CVSItem that has this one as + a direct predecessor.""" + + raise NotImplementedError() + + def get_cvs_symbol_ids_opened(self): + """Return an iterable over the ids of CVSSymbols that this item opens. + + The definition of 'open' is that the path corresponding to this + CVSItem will have to be copied when filling the corresponding + symbol.""" + + raise NotImplementedError() + + def get_ids_closed(self): + """Return an iterable over the CVSItem.ids of CVSItems closed by this one. + + A CVSItem A is said to close a CVSItem B if committing A causes B + to be overwritten or deleted (no longer available) in the SVN + repository. This is interesting because it sets the last SVN + revision number from which the contents of B can be copied (for + example, to fill a symbol). See the concrete implementations of + this method for the exact rules about what closes what.""" + + raise NotImplementedError() + + def __repr__(self): + return '%s(%s)' % (self.__class__.__name__, self,) + class CVSRevision(CVSItem): """Information about a single CVS revision. A CVSRevision holds the information known about a single version of - a single file.""" + a single file. + + Members: + ID -- (string) unique ID for this revision. + CVS_FILE -- (CVSFile) CVSFile affected by this revision. + TIMESTAMP -- (int) date stamp for this revision. + METADATA_ID -- (int) id of author + log message record in metadata_db. + PREV_ID -- (int) id of the logically previous CVSRevision, either on the + same or the source branch (or None). + NEXT_ID -- (int) id of the logically next CVSRevision (or None). + REV -- (string) the CVS revision number, e.g., '1.3'. + DELTATEXT_EXISTS -- (bool) true iff this revision's deltatext is not + empty. + LOD -- (LineOfDevelopment) LOD on which this revision occurred. + FIRST_ON_BRANCH_ID -- (int or None) if this revision is the first on its + branch, the cvs_branch_id of that branch; else, None. + DEFAULT_BRANCH_REVISION -- (bool) true iff this is a default branch + revision. + DEFAULT_BRANCH_PREV_ID -- (int or None) Iff this is the 1.2 revision after + the end of a default branch, the id of the last rev on the default + branch; else, None. + DEFAULT_BRANCH_NEXT_ID -- (int or None) Iff this is the last revision on + a default branch preceding a 1.2 rev, the id of the 1.2 revision; + else, None. + TAG_IDS -- (list of int) ids of all CVSTags rooted at this CVSRevision. + BRANCH_IDS -- (list of int) ids of all CVSBranches rooted at this + CVSRevision. + BRANCH_COMMIT_IDS -- (list of int) ids of first CVSRevision committed on + each branch rooted in this revision (for branches with commits). + OPENED_SYMBOLS -- (None or list of (symbol_id, cvs_symbol_id) tuples) + information about all CVSSymbols opened by this revision. This member + is set in FilterSymbolsPass; before then, it is None. + CLOSED_SYMBOLS -- (None or list of (symbol_id, cvs_symbol_id) tuples) + information about all CVSSymbols closed by this revision. This member + is set in FilterSymbolsPass; before then, it is None. + REVISION_RECORDER_TOKEN -- (arbitrary) a token that can be set by + RevisionRecorder for the later use of RevisionReader. + + """ def __init__(self, id, cvs_file, timestamp, metadata_id, prev_id, next_id, - op, rev, deltatext_exists, - lod, first_on_branch, default_branch_revision, - tag_ids, branch_ids, closed_symbol_ids): - """Initialize a new CVSRevision object. - - Arguments: - ID --> (string) unique ID for this revision. - CVS_FILE --> (CVSFile) CVSFile affected by this revision - TIMESTAMP --> (int) date stamp for this cvs revision - METADATA_ID --> (int) id of author+logmsg record in metadata_db - PREV_ID --> (int) id of the previous cvs revision (or None) - NEXT_ID --> (int) id of the next cvs revision (or None) - OP --> (char) OP_ADD, OP_CHANGE, or OP_DELETE - REV --> (string) this CVS rev, e.g., '1.3' - DELTATEXT_EXISTS--> (bool) true iff non-empty deltatext - LOD --> (LineOfDevelopment) LOD where this rev occurred - FIRST_ON_BRANCH --> (bool) true iff the first rev on its branch - DEFAULT_BRANCH_REVISION --> (bool) true iff this is a default branch - revision - TAG_IDS --> (list of int) ids of all tags on this revision - BRANCH_IDS --> (list of int) ids of all branches rooted in this - revision - CLOSED_SYMBOL_IDS --> (list of int) ids of all symbols closed by - this revision - """ + rev, deltatext_exists, + lod, first_on_branch_id, default_branch_revision, + default_branch_prev_id, default_branch_next_id, + tag_ids, branch_ids, branch_commit_ids, + revision_recorder_token): + """Initialize a new CVSRevision object.""" CVSItem.__init__(self, id, cvs_file) - self.rev = rev self.timestamp = timestamp self.metadata_id = metadata_id - self.op = op self.prev_id = prev_id self.next_id = next_id + self.rev = rev self.deltatext_exists = deltatext_exists self.lod = lod - self.first_on_branch = first_on_branch + self.first_on_branch_id = first_on_branch_id self.default_branch_revision = default_branch_revision + self.default_branch_prev_id = default_branch_prev_id + self.default_branch_next_id = default_branch_next_id self.tag_ids = tag_ids self.branch_ids = branch_ids - self.closed_symbol_ids = closed_symbol_ids + self.branch_commit_ids = branch_commit_ids + self.opened_symbols = None + self.closed_symbols = None + self.revision_recorder_token = revision_recorder_token def _get_cvs_path(self): return self.cvs_file.cvs_path @@ -94,9 +205,7 @@ class CVSRevision(CVSItem): cvs_path = property(_get_cvs_path) def get_svn_path(self): - return self.lod.make_path(self.cvs_file) - - svn_path = property(get_svn_path) + return self.lod.get_path(self.cvs_file.cvs_path) def __getstate__(self): """Return the contents of this instance, for pickling. @@ -104,55 +213,149 @@ class CVSRevision(CVSItem): The presence of this method improves the space efficiency of pickling CVSRevision instances.""" - if isinstance(self.lod, Branch): - lod_id = self.lod.symbol.id - else: - lod_id = None - return ( self.id, self.cvs_file.id, self.timestamp, self.metadata_id, self.prev_id, self.next_id, - self.op, self.rev, self.deltatext_exists, - lod_id, - self.first_on_branch, + self.lod.id, + self.first_on_branch_id, self.default_branch_revision, - ' '.join(['%x' % id for id in self.tag_ids]), - ' '.join(['%x' % id for id in self.branch_ids]), - ' '.join(['%x' % id for id in self.closed_symbol_ids]),) + self.default_branch_prev_id, self.default_branch_next_id, + self.tag_ids, self.branch_ids, self.branch_commit_ids, + self.opened_symbols, self.closed_symbols, + self.revision_recorder_token, + ) def __setstate__(self, data): - (self.id, cvs_file_id, self.timestamp, self.metadata_id, - self.prev_id, self.next_id, self.op, self.rev, + (self.id, cvs_file_id, + self.timestamp, self.metadata_id, + self.prev_id, self.next_id, + self.rev, self.deltatext_exists, - lod_id, self.first_on_branch, self.default_branch_revision, - tag_ids, branch_ids, closed_symbol_ids) = data + lod_id, + self.first_on_branch_id, + self.default_branch_revision, + self.default_branch_prev_id, self.default_branch_next_id, + self.tag_ids, self.branch_ids, self.branch_commit_ids, + self.opened_symbols, self.closed_symbols, + self.revision_recorder_token) = data self.cvs_file = Ctx()._cvs_file_db.get_file(cvs_file_id) - if lod_id is None: - self.lod = Trunk() - else: - self.lod = Branch(Ctx()._symbol_db.get_symbol(lod_id)) - self.tag_ids = [int(s, 16) for s in tag_ids.split()] - self.branch_ids = [int(s, 16) for s in branch_ids.split()] - self.closed_symbol_ids = [int(s, 16) for s in closed_symbol_ids.split()] - - def opens_symbol(self, symbol_id): - """Return True iff this CVSRevision is the opening CVSRevision for - SYMBOL_ID (for this RCS file).""" - - if symbol_id in self.tag_ids: - return True - if symbol_id in self.branch_ids: - # If this cvs_rev opens a branch and our op is OP_DELETE, then - # that means that the file that this cvs_rev belongs to was - # created on the branch, so for all intents and purposes, this - # cvs_rev is *technically* not an opening. See Issue #62 for - # more information. - if self.op != OP_DELETE: - return True - return False + self.lod = Ctx()._symbol_db.get_symbol(lod_id) + + def get_symbol_pred_ids(self): + """Return the pred_ids for symbol predecessors.""" + + retval = set() + if self.first_on_branch_id is not None: + retval.add(self.first_on_branch_id) + return retval + + def get_pred_ids(self): + retval = self.get_symbol_pred_ids() + if self.prev_id is not None: + retval.add(self.prev_id) + if self.default_branch_prev_id is not None: + retval.add(self.default_branch_prev_id) + return retval + + def get_symbol_succ_ids(self): + """Return the succ_ids for symbol successors.""" + + retval = set() + for id in self.branch_ids + self.tag_ids: + retval.add(id) + return retval + + def get_succ_ids(self): + retval = self.get_symbol_succ_ids() + if self.next_id is not None: + retval.add(self.next_id) + if self.default_branch_next_id is not None: + retval.add(self.default_branch_next_id) + for id in self.branch_commit_ids: + retval.add(id) + return retval + + def get_ids_closed(self): + # Special handling is needed in the case of non-trunk default + # branches. The following cases have to be handled: + # + # Case 1: Revision 1.1 not deleted; revision 1.2 exists: + # + # 1.1 -----------------> 1.2 + # \ ^ ^ / + # \ | | / + # 1.1.1.1 -> 1.1.1.2 + # + # * 1.1.1.1 closes 1.1 (because its post-commit overwrites 1.1 + # on trunk) + # + # * 1.1.1.2 closes 1.1.1.1 + # + # * 1.2 doesn't close anything (the post-commit from 1.1.1.1 + # already closed 1.1, and no symbols can sprout from the + # post-commit of 1.1.1.2) + # + # Case 2: Revision 1.1 not deleted; revision 1.2 does not exist: + # + # 1.1 .................. + # \ ^ ^ + # \ | | + # 1.1.1.1 -> 1.1.1.2 + # + # * 1.1.1.1 closes 1.1 (because its post-commit overwrites 1.1 + # on trunk) + # + # * 1.1.1.2 closes 1.1.1.1 + # + # Case 3: Revision 1.1 deleted; revision 1.2 exists: + # + # ............... 1.2 + # ^ ^ / + # | | / + # 1.1.1.1 -> 1.1.1.2 + # + # * 1.1.1.1 doesn't close anything + # + # * 1.1.1.2 closes 1.1.1.1 + # + # * 1.2 doesn't close anything (no symbols can sprout from the + # post-commit of 1.1.1.2) + # + # Case 4: Revision 1.1 deleted; revision 1.2 doesn't exist: + # + # ............... + # ^ ^ + # | | + # 1.1.1.1 -> 1.1.1.2 + # + # * 1.1.1.1 doesn't close anything + # + # * 1.1.1.2 closes 1.1.1.1 + + if self.first_on_branch_id is not None: + # The first CVSRevision on a branch is considered to close the + # branch: + yield self.first_on_branch_id + if self.default_branch_revision: + # If the 1.1 revision was not deleted, the 1.1.1.1 revision is + # considered to close it: + yield self.prev_id + elif self.default_branch_prev_id is not None: + # This is the special case of a 1.2 revision that follows a + # non-trunk default branch. Either 1.1 was deleted or the first + # default branch revision closed 1.1, so we don't have to close + # 1.1. Technically, we close the revision on trunk that was + # copied from the last non-trunk default branch revision in a + # post-commit, but for now no symbols can sprout from that + # revision so we ignore that one, too. + pass + elif self.prev_id is not None: + # Since this CVSRevision is not the first on a branch, its + # prev_id is on the same LOD and this item closes that one: + yield self.prev_id def __str__(self): """For convenience only. The format is subject to change at any time.""" @@ -160,77 +363,262 @@ class CVSRevision(CVSItem): return '%s:%s<%x>' % (self.cvs_file, self.rev, self.id,) +class CVSRevisionModification(CVSRevision): + """Base class for CVSRevisionAdd or CVSRevisionChange.""" + + def get_cvs_symbol_ids_opened(self): + return self.tag_ids + self.branch_ids + + +class CVSRevisionAdd(CVSRevisionModification): + """A CVSRevision that creates a file that previously didn't exist. + + The file might have never existed on this LOD, or it might have + existed previously but been deleted by a CVSRevisionDelete.""" + + pass + + +class CVSRevisionChange(CVSRevisionModification): + """A CVSRevision that modifies a file that already existed on this LOD.""" + + pass + + +class CVSRevisionAbsent(CVSRevision): + """A CVSRevision for which the file is nonexistent on this LOD.""" + + def get_cvs_symbol_ids_opened(self): + return [] + + +class CVSRevisionDelete(CVSRevisionAbsent): + """A CVSRevision that deletes a file that existed on this LOD.""" + + pass + + +class CVSRevisionNoop(CVSRevisionAbsent): + """A CVSRevision that doesn't do anything. + + The revision was 'dead' and the predecessor either didn't exist or + was also 'dead'. These revisions can't necessarily be thrown away + because (1) they impose ordering constraints on other items; (2) + they might have a nontrivial log message that we don't want to throw + away.""" + + pass + + +# A map +# +# {(nondead(cvs_rev), nondead(prev_cvs_rev)) : cvs_revision_subtype} +# +# , where nondead() means that the cvs revision exists and is not +# 'dead', and CVS_REVISION_SUBTYPE is the subtype of CVSRevision that +# should be used for CVS_REV. +cvs_revision_type_map = { + (False, False) : CVSRevisionNoop, + (False, True) : CVSRevisionDelete, + (True, False) : CVSRevisionAdd, + (True, True) : CVSRevisionChange, + } + + class CVSSymbol(CVSItem): """Represent a symbol on a particular CVSFile. - This is the base class for CVSBranch and CVSTag.""" + This is the base class for CVSBranch and CVSTag. - def __init__(self, id, cvs_file, symbol, rev_id): - """Initialize a CVSSymbol object. + Members: + ID -- (string) unique ID for this item. + CVS_FILE -- (CVSFile) CVSFile affected by this item. + SYMBOL -- (Symbol) the symbol affected by this CVSSymbol. + SOURCE_LOD -- (LineOfDevelopment) the LOD that is the source for this + CVSSymbol. + SOURCE_ID -- (int) the ID of the CVSRevision or CVSBranch that is the + source for this item. - Arguments: - ID --> (string) unique ID for this item - CVS_FILE --> (CVSFile) CVSFile affected by this revision - SYMBOL --> (Symbol) the corresponding symbol - REV_ID --> (int) the ID of the revision being tagged""" + """ + + def __init__(self, id, cvs_file, symbol, source_lod, source_id): + """Initialize a CVSSymbol object.""" CVSItem.__init__(self, id, cvs_file) self.symbol = symbol - self.rev_id = rev_id + self.source_lod = source_lod + self.source_id = source_id + + def get_svn_path(self): + return self.symbol.get_path(self.cvs_file.cvs_path) + + def get_ids_closed(self): + # A Symbol does not close any other CVSItems: + return [] class CVSBranch(CVSSymbol): - """Represent the creation of a branch in a particular CVSFile.""" + """Represent the creation of a branch in a particular CVSFile. + + Members: + ID -- (string) unique ID for this item. + CVS_FILE -- (CVSFile) CVSFile affected by this item. + SYMBOL -- (Symbol) the symbol affected by this CVSSymbol. + BRANCH_NUMBER -- (string) the number of this branch (e.g., '1.3.4'), or + None if this is a converted CVSTag. + SOURCE_LOD -- (LineOfDevelopment) the LOD that is the source for this + CVSSymbol. + SOURCE_ID -- (int) id of the CVSRevision or CVSBranch from which this + branch sprouts. + NEXT_ID -- (int or None) id of first CVSRevision on this branch, if any; + else, None. + TAG_IDS -- (list of int) ids of all CVSTags rooted at this CVSBranch (can + be set due to parent adjustment in FilterSymbolsPass). + BRANCH_IDS -- (list of int) ids of all CVSBranches rooted at this + CVSBranch (can be set due to parent adjustment in FilterSymbolsPass). + OPENED_SYMBOLS -- (None or list of (symbol_id, cvs_symbol_id) tuples) + information about all CVSSymbols opened by this branch. This member + is set in FilterSymbolsPass; before then, it is None. - def __init__(self, id, cvs_file, symbol, branch_number, rev_id, next_id): - """Initialize a CVSBranch. + """ - Arguments: - ID --> (string) unique ID for this item - CVS_FILE --> (CVSFile) CVSFile affected by this revision - SYMBOL --> (Symbol) the corresponding symbol - BRANCH_NUMBER --> (string) the number of this branch (e.g., "1.3.4") - REV_ID --> (int) id of CVSRevision from which this branch - sprouts - NEXT_ID --> (int or None) id of first rev on this branch""" + def __init__( + self, id, cvs_file, symbol, branch_number, + source_lod, source_id, next_id + ): + """Initialize a CVSBranch.""" - CVSSymbol.__init__(self, id, cvs_file, symbol, rev_id) + CVSSymbol.__init__(self, id, cvs_file, symbol, source_lod, source_id) self.branch_number = branch_number self.next_id = next_id + self.tag_ids = [] + self.branch_ids = [] + self.opened_symbols = None def __getstate__(self): return ( self.id, self.cvs_file.id, - self.symbol.id, self.branch_number, self.rev_id, self.next_id) + self.symbol.id, self.branch_number, + self.source_lod.id, self.source_id, self.next_id, + self.tag_ids, self.branch_ids, + self.opened_symbols, + ) def __setstate__(self, data): - (self.id, cvs_file_id, - symbol_id, self.branch_number, self.rev_id, self.next_id) = data + ( + self.id, cvs_file_id, + symbol_id, self.branch_number, + source_lod_id, self.source_id, self.next_id, + self.tag_ids, self.branch_ids, + self.opened_symbols + ) = data self.cvs_file = Ctx()._cvs_file_db.get_file(cvs_file_id) self.symbol = Ctx()._symbol_db.get_symbol(symbol_id) + self.source_lod = Ctx()._symbol_db.get_symbol(source_lod_id) + + def get_pred_ids(self): + return set([self.source_id]) + + def get_succ_ids(self): + retval = set(self.tag_ids + self.branch_ids) + if self.next_id is not None: + retval.add(self.next_id) + return retval + + def get_cvs_symbol_ids_opened(self): + return self.tag_ids + self.branch_ids + + def __str__(self): + """For convenience only. The format is subject to change at any time.""" + + return '%s:%s:%s<%x>' \ + % (self.cvs_file, self.symbol, self.branch_number, self.id,) + + +class CVSBranchNoop(CVSBranch): + """A CVSBranch whose source is a CVSRevisionAbsent.""" + + def get_cvs_symbol_ids_opened(self): + return [] + + +# A map +# +# {nondead(source_cvs_rev) : cvs_branch_subtype} +# +# , where nondead() means that the cvs revision exists and is not +# 'dead', and CVS_BRANCH_SUBTYPE is the subtype of CVSBranch that +# should be used. +cvs_branch_type_map = { + False : CVSBranchNoop, + True : CVSBranch, + } class CVSTag(CVSSymbol): - """Represent the creation of a tag on a particular CVSFile.""" + """Represent the creation of a tag on a particular CVSFile. - def __init__(self, id, cvs_file, symbol, rev_id): - """Initialize a CVSTag. + Members: + ID -- (string) unique ID for this item. + CVS_FILE -- (CVSFile) CVSFile affected by this item. + SYMBOL -- (Symbol) the symbol affected by this CVSSymbol. + SOURCE_LOD -- (LineOfDevelopment) the LOD that is the source for this + CVSSymbol. + SOURCE_ID -- (int) the ID of the CVSRevision or CVSBranch that is being + tagged. + + """ - Arguments: - ID --> (string) unique ID for this item - CVS_FILE --> (CVSFile) CVSFile affected by this revision - SYMBOL --> (Symbol) the corresponding symbol - REV_ID --> (int) id of CVSRevision being tagged""" + def __init__(self, id, cvs_file, symbol, source_lod, source_id): + """Initialize a CVSTag.""" - CVSSymbol.__init__(self, id, cvs_file, symbol_id, rev_id) + CVSSymbol.__init__(self, id, cvs_file, symbol, source_lod, source_id) def __getstate__(self): - return (self.id, self.cvs_file.id, self.symbol.id, self.rev_id) + return ( + self.id, self.cvs_file.id, self.symbol.id, + self.source_lod.id, self.source_id, + ) def __setstate__(self, data): - (self.id, cvs_file_id, symbol_id, self.rev_id) = data + (self.id, cvs_file_id, symbol_id, source_lod_id, self.source_id) = data self.cvs_file = Ctx()._cvs_file_db.get_file(cvs_file_id) self.symbol = Ctx()._symbol_db.get_symbol(symbol_id) + self.source_lod = Ctx()._symbol_db.get_symbol(source_lod_id) + + def get_pred_ids(self): + return set([self.source_id]) + + def get_succ_ids(self): + return set() + + def get_cvs_symbol_ids_opened(self): + return [] + + def __str__(self): + """For convenience only. The format is subject to change at any time.""" + + return '%s:%s<%x>' \ + % (self.cvs_file, self.symbol, self.id,) + + +class CVSTagNoop(CVSTag): + """A CVSTag whose source is a CVSRevisionAbsent.""" + + pass + + +# A map +# +# {nondead(source_cvs_rev) : cvs_tag_subtype} +# +# , where nondead() means that the cvs revision exists and is not +# 'dead', and CVS_TAG_SUBTYPE is the subtype of CVSTag that should be +# used. +cvs_tag_type_map = { + False : CVSTagNoop, + True : CVSTag, + } diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/cvs_item_database.py cvs2svn-2.0.0/cvs2svn_lib/cvs_item_database.py --- cvs2svn-1.5.x/cvs2svn_lib/cvs_item_database.py 2006-09-16 23:34:23.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/cvs_item_database.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -19,28 +19,42 @@ from __future__ import generators -import struct import cPickle from cvs2svn_lib.boolean import * +from cvs2svn_lib.common import DB_OPEN_NEW +from cvs2svn_lib.common import DB_OPEN_READ +from cvs2svn_lib.common import DB_OPEN_WRITE from cvs2svn_lib.common import FatalError -from cvs2svn_lib.cvs_item import CVSRevision +from cvs2svn_lib.log import Log +from cvs2svn_lib.cvs_item import CVSRevisionAdd +from cvs2svn_lib.cvs_item import CVSRevisionChange +from cvs2svn_lib.cvs_item import CVSRevisionDelete +from cvs2svn_lib.cvs_item import CVSRevisionNoop from cvs2svn_lib.cvs_item import CVSBranch +from cvs2svn_lib.cvs_item import CVSBranchNoop from cvs2svn_lib.cvs_item import CVSTag -from cvs2svn_lib.primed_pickle import get_memos -from cvs2svn_lib.primed_pickle import PrimedPickler -from cvs2svn_lib.primed_pickle import PrimedUnpickler -from cvs2svn_lib.record_table import NewRecordTable -from cvs2svn_lib.record_table import OldRecordTable +from cvs2svn_lib.cvs_item import CVSTagNoop +from cvs2svn_lib.cvs_file_items import CVSFileItems +from cvs2svn_lib.serializer import PrimedPickleSerializer +from cvs2svn_lib.database import IndexedStore + + +_cvs_item_primer = ( + CVSRevisionAdd, CVSRevisionChange, + CVSRevisionDelete, CVSRevisionNoop, + CVSBranch, CVSBranchNoop, + CVSTag, CVSTagNoop, + ) class NewCVSItemStore: """A file of sequential CVSItems, grouped by CVSFile. - The file consists of a sequence of pickles. The zeroth one is an - 'unpickler_memo' as described in the primed_pickle module. - Subsequent ones are pickled lists of CVSItems, each list containing - all of the CVSItems for a single file. + The file consists of a sequence of pickles. The zeroth one is a + Serializer as described in the serializer module. Subsequent ones + are pickled lists of CVSItems, each list containing all of the + CVSItems for a single file. We don't use a single pickler for all items because the memo would grow too large.""" @@ -50,162 +64,53 @@ class NewCVSItemStore: self.f = open(filename, 'wb') - primer = (CVSRevision, CVSBranch, CVSTag,) - (pickler_memo, unpickler_memo,) = get_memos(primer) - self.pickler = PrimedPickler(pickler_memo) - cPickle.dump(unpickler_memo, self.f, -1) - - self.current_file_id = None - self.current_file_items = [] - - def _flush(self): - """Write the current items to disk.""" - - if self.current_file_items: - self.pickler.dumpf(self.f, self.current_file_items) - self.current_file_id = None - self.current_file_items = [] - - def add(self, cvs_item): - """Write cvs_item into the database.""" - - if cvs_item.cvs_file.id != self.current_file_id: - self._flush() - self.current_file_id = cvs_item.cvs_file.id - self.current_file_items.append(cvs_item) - - def close(self): - self._flush() - self.f.close() - - -# Convert file offsets to 8-bit little-endian unsigned longs... -INDEX_FORMAT = '<Q' -# ...but then truncate to 5 bytes. (This is big enough to represent a -# terabyte.) -INDEX_FORMAT_LEN = 5 - - -class NewIndexTable(NewRecordTable): - def __init__(self, filename): - NewRecordTable.__init__(self, filename, INDEX_FORMAT_LEN) - - def pack(self, v): - return struct.pack(INDEX_FORMAT, v)[:INDEX_FORMAT_LEN] - - -class NewIndexedCVSItemStore: - """A file of CVSItems that is written sequentially. - - The file consists of a sequence of pickles. The zeroth one is a - tuple (pickler_memo, unpickler_memo) as described in the - primed_pickle module. Subsequent ones are pickled CVSItems. The - offset of each CVSItem in the file is stored to an index table so - that the data can later be retrieved randomly (via - OldIndexedCVSItemStore).""" - - def __init__(self, filename, index_filename): - """Initialize an instance, creating the files and writing the primer.""" - - self.f = open(filename, 'wb') - self.index_table = NewIndexTable(index_filename) - - primer = (CVSRevision, CVSBranch, CVSTag,) - (pickler_memo, unpickler_memo,) = get_memos(primer) - self.pickler = PrimedPickler(pickler_memo) - cPickle.dump((pickler_memo, unpickler_memo,), self.f, -1) + self.serializer = PrimedPickleSerializer( + _cvs_item_primer + (CVSFileItems,) + ) + cPickle.dump(self.serializer, self.f, -1) - def add(self, cvs_item): - """Write cvs_item into the database.""" + def add(self, cvs_file_items): + """Write CVS_FILE_ITEMS into the database.""" - self.index_table[cvs_item.id] = self.f.tell() - self.pickler.dumpf(self.f, cvs_item) + self.serializer.dumpf(self.f, cvs_file_items) def close(self): - self.index_table.close() self.f.close() + self.f = None class OldCVSItemStore: """Read a file created by NewCVSItemStore. - The file must be read sequentially, except that it is possible to - read old CVSItems from the current CVSFile.""" + The file must be read sequentially, one CVSFileItems instance at a + time.""" def __init__(self, filename): self.f = open(filename, 'rb') # Read the memo from the first pickle: - unpickler_memo = cPickle.load(self.f) - self.unpickler = PrimedUnpickler(unpickler_memo) + self.serializer = cPickle.load(self.f) - self.current_file_items = [] - self.current_file_map = {} + def iter_cvs_file_items(self): + """Iterate through the CVSFileItems instances, one file at a time. - def _read_file_chunk(self): - self.current_file_items = self.unpickler.loadf(self.f) - self.current_file_map = {} - for item in self.current_file_items: - self.current_file_map[item.id] = item + Each time yield a CVSFileItems instance for one CVSFile.""" - def __iter__(self): - while True: try: - self._read_file_chunk() + while True: + yield self.serializer.loadf(self.f) except EOFError: return - for item in self.current_file_items: - yield item - - def __getitem__(self, id): - try: - return self.current_file_map[id] - except KeyError: - raise FatalError( - 'Key %r not found within items currently accessible.' % (id,)) - - -class OldIndexTable(OldRecordTable): - PAD = '\0' * (struct.calcsize(INDEX_FORMAT) - INDEX_FORMAT_LEN) - - def __init__(self, filename): - OldRecordTable.__init__(self, filename, INDEX_FORMAT_LEN) - - def unpack(self, s): - (v,) = struct.unpack(INDEX_FORMAT, s + self.PAD) - return v - - -class OldIndexedCVSItemStore: - """Read a pair of files created by NewIndexedCVSItemStore. - - The file can be read randomly but it cannot be written to.""" - - def __init__(self, filename, index_filename): - self.f = open(filename, 'rb') - self.index_table = OldIndexTable(index_filename) - - # Read the memo from the first pickle: - (pickler_memo, unpickler_memo,) = cPickle.load(self.f) - self.unpickler = PrimedUnpickler(unpickler_memo) - - def _fetch(self, offset): - self.f.seek(offset) - return self.unpickler.loadf(self.f) - - def __iter__(self): - for offset in self.index_table: - if offset != 0: - yield self._fetch(offset) - - def __getitem__(self, id): - offset = self.index_table[id] - if offset == 0: - raise KeyError() - return self._fetch(offset) def close(self): self.f.close() - self.index_table.close() + self.f = None + + +def IndexedCVSItemStore(filename, index_filename, mode): + return IndexedStore( + filename, index_filename, mode, + PrimedPickleSerializer(_cvs_item_primer) + ) diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/cvs_repository.py cvs2svn-2.0.0/cvs2svn_lib/cvs_repository.py --- cvs2svn-1.5.x/cvs2svn_lib/cvs_repository.py 2006-09-10 16:36:26.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/cvs_repository.py 1970-01-01 01:00:00.000000000 +0100 @@ -1,143 +0,0 @@ -# (Be in -*- python -*- mode.) -# -# ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. -# -# This software is licensed as described in the file COPYING, which -# you should have received as part of this distribution. The terms -# are also available at http://subversion.tigris.org/license-1.html. -# If newer versions of this license are posted there, you may use a -# newer version instead, at your option. -# -# This software consists of voluntary contributions made by many -# individuals. For exact contribution history, see the revision -# history and logs, available at http://cvs2svn.tigris.org/. -# ==================================================================== - -"""This module provides access to the CVS repository for cvs2svn.""" - - -import os -import re - -from cvs2svn_lib.boolean import * -from cvs2svn_lib.common import FatalError -from cvs2svn_lib.context import Ctx -from cvs2svn_lib.process import check_command_runs -from cvs2svn_lib.process import SimplePopen -from cvs2svn_lib.process import CommandFailedException - - -class CVSRepository: - """A CVS repository from which data can be extracted.""" - - def __init__(self, cvs_repos_path): - """CVS_REPOS_PATH is the top of the CVS repository (at least as - far as this run is concerned).""" - - if not os.path.isdir(cvs_repos_path): - raise FatalError("The specified CVS repository path '%s' is not an " - "existing directory." % cvs_repos_path) - - self.cvs_repos_path = os.path.normpath(cvs_repos_path) - self.cvs_prefix_re = re.compile( - r'^' + re.escape(self.cvs_repos_path) - + r'(' + re.escape(os.sep) + r'|$)') - - def get_co_pipe(self, cvs_rev, suppress_keyword_substitution=False): - """Return a command string, and a pipe from which the file - contents of CVS_REV can be read. CVS_REV is a CVSRevision. If - SUPPRESS_KEYWORD_SUBSTITUTION is True, then suppress the - substitution of RCS/CVS keywords in the output. Standard output - of the pipe returns the text of that CVS Revision. - - The command string that is returned is provided for use in error - messages; it is not escaped in such a way that it could - necessarily be executed.""" - - raise NotImplementedError - - -class CVSRepositoryViaRCS(CVSRepository): - """A CVSRepository accessed via RCS.""" - - def __init__(self, cvs_repos_path): - CVSRepository.__init__(self, cvs_repos_path) - try: - check_command_runs([ Ctx().co_executable, '-V' ], 'co') - except CommandFailedException, e: - raise FatalError('%s\n' - 'Please check that co is installed and in your PATH\n' - '(it is a part of the RCS software).' % (e,)) - - def get_co_pipe(self, cvs_rev, suppress_keyword_substitution=False): - pipe_cmd = [ Ctx().co_executable, '-q', '-x,v', '-p' + cvs_rev.rev ] - if suppress_keyword_substitution: - pipe_cmd.append('-kk') - pipe_cmd.append(cvs_rev.cvs_file.filename) - pipe = SimplePopen(pipe_cmd, True) - pipe.stdin.close() - return ' '.join(pipe_cmd), pipe - - -class CVSRepositoryViaCVS(CVSRepository): - """A CVSRepository accessed via CVS.""" - - def __init__(self, cvs_repos_path): - CVSRepository.__init__(self, cvs_repos_path) - # Ascend above the specified root if necessary, to find the - # cvs_repository_root (a directory containing a CVSROOT directory) - # and the cvs_module (the path of the conversion root within the - # cvs repository) NB: cvs_module must be seperated by '/' *not* by - # os.sep . - def is_cvs_repository_root(path): - return os.path.isdir(os.path.join(path, 'CVSROOT')) - - self.cvs_repository_root = os.path.abspath(self.cvs_repos_path) - self.cvs_module = "" - while not is_cvs_repository_root(self.cvs_repository_root): - # Step up one directory: - prev_cvs_repository_root = self.cvs_repository_root - self.cvs_repository_root, module_component = \ - os.path.split(self.cvs_repository_root) - if self.cvs_repository_root == prev_cvs_repository_root: - # Hit the root (of the drive, on Windows) without finding a - # CVSROOT dir. - raise FatalError( - "the path '%s' is not a CVS repository, nor a path " - "within a CVS repository. A CVS repository contains " - "a CVSROOT directory within its root directory." - % (self.cvs_repos_path,)) - - self.cvs_module = module_component + "/" + self.cvs_module - - os.environ['CVSROOT'] = self.cvs_repository_root - - def cvs_ok(global_arguments): - check_command_runs( - [ Ctx().cvs_executable ] + global_arguments + [ '--version' ], - 'cvs') - - self.global_arguments = [ "-q", "-R" ] - try: - cvs_ok(self.global_arguments) - except CommandFailedException, e: - self.global_arguments = [ "-q" ] - try: - cvs_ok(self.global_arguments) - except CommandFailedException, e: - raise FatalError( - '%s\n' - 'Please check that cvs is installed and in your PATH.' % (e,)) - - def get_co_pipe(self, cvs_rev, suppress_keyword_substitution=False): - pipe_cmd = [ Ctx().cvs_executable ] + self.global_arguments + \ - [ 'co', '-r' + cvs_rev.rev, '-p' ] - if suppress_keyword_substitution: - pipe_cmd.append('-kk') - pipe_cmd.append(self.cvs_module + cvs_rev.cvs_path) - pipe = SimplePopen(pipe_cmd, True) - pipe.stdin.close() - return ' '.join(pipe_cmd), pipe - - diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/cvs_revision_aggregator.py cvs2svn-2.0.0/cvs2svn_lib/cvs_revision_aggregator.py --- cvs2svn-1.5.x/cvs2svn_lib/cvs_revision_aggregator.py 2006-09-10 16:36:26.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/cvs_revision_aggregator.py 1970-01-01 01:00:00.000000000 +0100 @@ -1,257 +0,0 @@ -# (Be in -*- python -*- mode.) -# -# ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. -# -# This software is licensed as described in the file COPYING, which -# you should have received as part of this distribution. The terms -# are also available at http://subversion.tigris.org/license-1.html. -# If newer versions of this license are posted there, you may use a -# newer version instead, at your option. -# -# This software consists of voluntary contributions made by many -# individuals. For exact contribution history, see the revision -# history and logs, available at http://cvs2svn.tigris.org/. -# ==================================================================== - -"""This module contains the CVSRevisionAggregator class.""" - - -from cvs2svn_lib.boolean import * -from cvs2svn_lib.set_support import * -from cvs2svn_lib import config -from cvs2svn_lib.context import Ctx -from cvs2svn_lib.artifact_manager import artifact_manager -from cvs2svn_lib.database import Database -from cvs2svn_lib.database import SDatabase -from cvs2svn_lib.database import DB_OPEN_NEW -from cvs2svn_lib.database import DB_OPEN_READ -from cvs2svn_lib.persistence_manager import PersistenceManager -from cvs2svn_lib.cvs_commit import CVSCommit -from cvs2svn_lib.svn_commit import SVNSymbolCloseCommit - - -class CVSRevisionAggregator: - """This class groups CVSRevisions into CVSCommits that represent - at least one SVNCommit.""" - - # How it works: CVSCommits are accumulated within an interval by - # metadata_id (commit log and author). - # - # In a previous implementation, we would just close a CVSCommit for - # further CVSRevisions and open a new CVSCommit if a second - # CVSRevision with the same (CVS) path arrived within the - # accumulation window. - # - # In the new code, there can be multiple open CVSCommits touching - # the same files within an accumulation window. A hash of pending - # CVSRevisions with associated CVSCommits is maintained. If a new - # CVSRevision is found to have a prev_rev in this hash, the - # corresponding CVSCommit is not eligible for accomodating the - # revision, but will be added to the dependency list of the commit - # the revision finally goes into. When a CVSCommit moves out of its - # accumulation window it is not scheduled for flush immediately, but - # instead enqueued in expired_queue. Only if all the CVSCommits - # this one depends on went out already, it can go out as well. - # Timestamps are adjusted accordingly - it could happen that a small - # CVSCommit is commited while a big commit it depends on is still - # underway in other directories. - - def __init__(self): - if not Ctx().trunk_only: - self.last_revs_db = Database( - artifact_manager.get_temp_file(config.SYMBOL_LAST_CVS_REVS_DB), - DB_OPEN_READ) - - # Map of CVSRevision metadata_ids to arrays of open CVSCommits. - # In each such array, every element has direct or indirect - # dependencies on all the preceding elements in the same array. - self.cvs_commits = {} - - # Map of CVSRevision ids to the CVSCommits they are part of. - self.pending_revs = {} - - # List of closed CVSCommits which might still have pending - # dependencies. - self.expired_queue = [] - - # List of CVSCommits that are ready to be committed, but might - # need to be delayed until a CVSRevision with a later timestamp is - # read. (This can happen if the timestamp of the ready CVSCommit - # had to be adjusted to make it later than its dependencies.) - self.ready_queue = [ ] - - # A set of symbol ids for which the last source CVSRevision has - # already been processed but which haven't been closed yet. - self._pending_symbols = set() - - # A set containing the symbol ids of closed symbols. That is, - # we've already encountered the last CVSRevision that is a source - # for that symbol, the final fill for this symbol has been done, - # and we never need to fill it again. - self._done_symbols = set() - - # This variable holds the most recently created primary svn_commit - # object. CVSRevisionAggregator maintains this variable merely - # for its date, so that it can set dates for the SVNCommits - # created in self._attempt_to_commit_symbols(). - self.latest_primary_svn_commit = None - - Ctx()._persistence_manager = PersistenceManager(DB_OPEN_NEW) - - def _get_deps(self, cvs_rev): - """Return the dependencies of CVS_REV. - - Return the tuple (MAIN_DEP, DEPS), where MAIN_DEP is the main - CVSCommit on which CVS_REV depends (or None if there is no main - dependency) and DEPS is the complete set of CVSCommit objects that - CVS_REV depends on directly or indirectly. (The result includes - both direct and indirect dependencies because it is used to - determine what CVSCommit CVS_REV can be added to.)""" - - main_dep = self.pending_revs.get(cvs_rev.prev_id) - if main_dep is None: - return (None, set(),) - deps = set([main_dep]) - # CVSCommits whose revisions' dependencies still have to be examined: - todo = set([main_dep]) - while todo: - dep = todo.pop() - for r in dep.revisions(): - dep2 = self.pending_revs.get(r.prev_id) - if dep2 is not None and dep2 not in deps: - deps.add(dep2) - todo.add(dep2) - - return (main_dep, deps,) - - def _extract_ready_commits(self, timestamp=None): - """Extract any active commits that expire by TIMESTAMP from - self.cvs_commits and append them to self.ready_queue. If - TIMESTAMP is not specified, then extract all commits.""" - - # First take all expired commits out of the pool of available commits. - for metadata_id, cvs_commits in self.cvs_commits.items(): - for cvs_commit in cvs_commits[:]: - if timestamp is None \ - or cvs_commit.t_max + config.COMMIT_THRESHOLD < timestamp: - self.expired_queue.append(cvs_commit) - cvs_commits.remove(cvs_commit) - if not cvs_commits: - del self.cvs_commits[metadata_id] - - # Then queue all closed commits with resolved dependencies for - # commit. We do this here instead of in _commit_ready_commits to - # avoid building deps on revisions that will be flushed - # immediately afterwards. - while self.expired_queue: - chg = False - for cvs_commit in self.expired_queue[:]: - if cvs_commit.resolve_dependencies(): - for r in cvs_commit.revisions(): - del self.pending_revs[r.id] - self.expired_queue.remove(cvs_commit) - cvs_commit.pending = False - self.ready_queue.append(cvs_commit) - chg = True - if not chg: - break - - def _commit_ready_commits(self, timestamp=None): - """Sort the commits from self.ready_queue by time, then process - them in order. If TIMESTAMP is specified, only process commits - that have timestamp previous to TIMESTAMP.""" - - self.ready_queue.sort() - while self.ready_queue and \ - (timestamp is None or self.ready_queue[0].t_max < timestamp): - cvs_commit = self.ready_queue.pop(0) - self.latest_primary_svn_commit = \ - cvs_commit.process_revisions(self._done_symbols) - self._attempt_to_commit_symbols() - - def process_revision(self, cvs_rev): - # Each time we read a new line, scan the accumulating commits to - # see if any are ready for processing. - self._extract_ready_commits(cvs_rev.timestamp) - - # Add this item into the set of still-available commits. - (dep, deps) = self._get_deps(cvs_rev) - cvs_commits = self.cvs_commits.setdefault(cvs_rev.metadata_id, []) - # This is pretty silly; it will add the revision to the oldest - # pending commit. It might be wiser to do time range matching to - # avoid stretching commits more than necessary. - for cvs_commit in cvs_commits: - if cvs_commit not in deps: - break - else: - author, log = Ctx()._metadata_db[cvs_rev.metadata_id] - cvs_commit = CVSCommit(cvs_rev.metadata_id, author, log) - cvs_commits.append(cvs_commit) - if dep is not None: - cvs_commit.add_dependency(dep) - cvs_commit.add_revision(cvs_rev) - self.pending_revs[cvs_rev.id] = cvs_commit - - # If there are any elements in the ready_queue at this point, they - # need to be processed, because this latest rev couldn't possibly - # be part of any of them. Limit the timestamp of commits to be - # processed, because re-stamping according to a commit's - # dependencies can alter the commit's timestamp. - self._commit_ready_commits(cvs_rev.timestamp) - - self._add_pending_symbols(cvs_rev) - - def flush(self): - """Commit anything left in self.cvs_commits. Then inform the - SymbolingsLogger that all commits are done.""" - - self._extract_ready_commits() - self._commit_ready_commits() - - def _add_pending_symbols(self, cvs_rev): - """Add to self._pending_symbols any symbols from CVS_REV for which - CVS_REV is the last CVSRevision. - - If we're not doing a trunk-only conversion, get the symbolic names - that this CVS_REV is the last *source* CVSRevision for and add - them to those left over from previous passes through the - aggregator.""" - - if not Ctx().trunk_only: - for symbol_id in self.last_revs_db.get('%x' % (cvs_rev.id,), []): - self._pending_symbols.add(symbol_id) - - def _attempt_to_commit_symbols(self): - """Generate one SVNCommit for each symbol in self._pending_symbols - that doesn't have an opening CVSRevision in either - self.cvs_commits, self.expired_queue or self.ready_queue.""" - - # Make a list of tuples (symbol_name, symbol) for all symbols from - # self._pending_symbols that do not have *source* CVSRevisions in - # the pending commit queues (self.expired_queue or - # self.ready_queue): - closeable_symbols = [] - pending_commits = self.expired_queue + self.ready_queue - for commits in self.cvs_commits.itervalues(): - pending_commits.extend(commits) - for symbol_id in self._pending_symbols: - for cvs_commit in pending_commits: - if cvs_commit.opens_symbol(symbol_id): - break - else: - symbol = Ctx()._symbol_db.get_symbol(symbol_id) - closeable_symbols.append( (symbol.name, symbol,) ) - - # Sort the closeable symbols so that we will always process the - # symbols in the same order, regardless of the order in which the - # dict hashing algorithm hands them back to us. We do this so - # that our tests will get the same results on all platforms. - closeable_symbols.sort() - for (symbol_name, symbol,) in closeable_symbols: - Ctx()._persistence_manager.put_svn_commit( - SVNSymbolCloseCommit(symbol, self.latest_primary_svn_commit.date)) - self._done_symbols.add(symbol.id) - self._pending_symbols.remove(symbol.id) - - diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/cvs_revision_resynchronizer.py cvs2svn-2.0.0/cvs2svn_lib/cvs_revision_resynchronizer.py --- cvs2svn-1.5.x/cvs2svn_lib/cvs_revision_resynchronizer.py 2006-09-10 19:43:25.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/cvs_revision_resynchronizer.py 1970-01-01 01:00:00.000000000 +0100 @@ -1,176 +0,0 @@ -# (Be in -*- python -*- mode.) -# -# ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. -# -# This software is licensed as described in the file COPYING, which -# you should have received as part of this distribution. The terms -# are also available at http://subversion.tigris.org/license-1.html. -# If newer versions of this license are posted there, you may use a -# newer version instead, at your option. -# -# This software consists of voluntary contributions made by many -# individuals. For exact contribution history, see the revision -# history and logs, available at http://cvs2svn.tigris.org/. -# ==================================================================== - -"""This module contains the CVSRevisionResynchronizer class.""" - - -from __future__ import generators - -import time - -from cvs2svn_lib.boolean import * -from cvs2svn_lib import config -from cvs2svn_lib.common import warning_prefix -from cvs2svn_lib.log import Log -from cvs2svn_lib.artifact_manager import artifact_manager - - -class CVSRevisionResynchronizer: - def __init__(self, cvs_items_db): - self.cvs_items_db = cvs_items_db - - self.resync = self._read_resync() - - self.output = open( - artifact_manager.get_temp_file(config.CVS_REVS_RESYNC_DATAFILE), 'w') - - def _read_resync(self): - """Read RESYNC_DATAFILE and return its contents. - - Return a map that maps a metadata_id to a sequence of lists which - specify a lower and upper time bound for matching up the commit: - - { metadata_id -> [[old_time_lower, old_time_upper, new_time], ...] } - - Each triplet is a list because we will dynamically expand the - lower/upper bound as we find commits that fall into a particular - msg and time range. We keep a sequence of these for each - metadata_id because a number of checkins with the same log message - (e.g. an empty log message) could need to be remapped. The lists - of triplets are sorted by old_time_lower. - - Note that we assume that we can hold the entire resync file in - memory. Really large repositories with wacky timestamps could - bust this assumption. Should that ever happen, then it is - possible to split the resync file into pieces and make multiple - passes, using each piece.""" - - DELTA = config.COMMIT_THRESHOLD/2 - - resync = { } - for line in file(artifact_manager.get_temp_file(config.RESYNC_DATAFILE)): - [t1, metadata_id, t2] = line.strip().split() - t1 = int(t1, 16) - metadata_id = int(metadata_id, 16) - t2 = int(t2, 16) - resync.setdefault(metadata_id, []).append([t1 - DELTA, t1 + DELTA, t2]) - - # For each metadata_id, sort the resync items: - for val in resync.values(): - val.sort() - - return resync - - def resynchronize(self, cvs_rev): - if cvs_rev.prev_id is not None: - prev_cvs_rev = self.cvs_items_db[cvs_rev.prev_id] - else: - prev_cvs_rev = None - - if cvs_rev.next_id is not None: - next_cvs_rev = self.cvs_items_db[cvs_rev.next_id] - else: - next_cvs_rev = None - - # see if this is "near" any of the resync records we have recorded - # for this metadata_id [of the log message]. - for record in self.resync.get(cvs_rev.metadata_id, []): - if record[2] == cvs_rev.timestamp: - # This means that either cvs_rev is the same revision that - # caused the resync record to exist, or cvs_rev is a different - # CVS revision that happens to have the same timestamp. In - # either case, we don't have to do anything, so we... - continue - - if record[0] <= cvs_rev.timestamp <= record[1]: - # bingo! We probably want to remap the time on this cvs_rev, - # unless the remapping would be useless because the new time - # would fall outside the COMMIT_THRESHOLD window for this - # commit group. - new_timestamp = record[2] - # If the new timestamp is earlier than that of our previous - # revision - if prev_cvs_rev and new_timestamp < prev_cvs_rev.timestamp: - Log().warn( - "%s: Attempt to set timestamp of revision %s on file %s" - " to time %s, which is before previous the time of" - " revision %s (%s):" - % (warning_prefix, cvs_rev.rev, cvs_rev.cvs_path, new_timestamp, - prev_cvs_rev.rev, prev_cvs_rev.timestamp)) - - # If resyncing our rev to prev_cvs_rev.timestamp + 1 will - # place the timestamp of cvs_rev within COMMIT_THRESHOLD of - # the attempted resync time, then sync back to - # prev_cvs_rev.timestamp + 1... - if ((prev_cvs_rev.timestamp + 1) - new_timestamp) \ - < config.COMMIT_THRESHOLD: - new_timestamp = prev_cvs_rev.timestamp + 1 - Log().warn("%s: Time set to %s" - % (warning_prefix, new_timestamp)) - else: - Log().warn("%s: Timestamp left untouched" % warning_prefix) - continue - - # If the new timestamp is later than that of our next revision - elif next_cvs_rev and new_timestamp > next_cvs_rev.timestamp: - Log().warn( - "%s: Attempt to set timestamp of revision %s on file %s" - " to time %s, which is after time of next" - " revision %s (%s):" - % (warning_prefix, cvs_rev.rev, cvs_rev.cvs_path, new_timestamp, - next_cvs_rev.rev, next_cvs_rev.timestamp)) - - # If resyncing our rev to next_cvs_rev.timestamp - 1 will place - # the timestamp of cvs_rev within COMMIT_THRESHOLD of the - # attempted resync time, then sync forward to - # next_cvs_rev.timestamp - 1... - if (new_timestamp - (next_cvs_rev.timestamp - 1)) \ - < config.COMMIT_THRESHOLD: - new_timestamp = next_cvs_rev.timestamp - 1 - Log().warn("%s: Time set to %s" - % (warning_prefix, new_timestamp)) - else: - Log().warn("%s: Timestamp left untouched" % warning_prefix) - continue - - # Fix for Issue #71: Avoid resyncing two consecutive revisions - # to the same timestamp. - elif (prev_cvs_rev and new_timestamp == prev_cvs_rev.timestamp - or next_cvs_rev and new_timestamp == next_cvs_rev.timestamp): - continue - - # adjust the time range. we want the COMMIT_THRESHOLD from the - # bounds of the earlier/latest commit in this group. - record[0] = min(record[0], - cvs_rev.timestamp - config.COMMIT_THRESHOLD/2) - record[1] = max(record[1], - cvs_rev.timestamp + config.COMMIT_THRESHOLD/2) - - msg = "PASS3 RESYNC: '%s' (%s): old time='%s' delta=%ds" \ - % (cvs_rev.cvs_path, cvs_rev.rev, time.ctime(cvs_rev.timestamp), - new_timestamp - cvs_rev.timestamp) - Log().verbose(msg) - - cvs_rev.timestamp = new_timestamp - - # stop looking for hits - break - - self.output.write( - '%08lx %x %x\n' - % (cvs_rev.timestamp, cvs_rev.metadata_id, cvs_rev.id,)) - - diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/database.py cvs2svn-2.0.0/cvs2svn_lib/database.py --- cvs2svn-1.5.x/cvs2svn_lib/database.py 2006-09-14 22:44:37.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/database.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -21,16 +21,17 @@ from __future__ import generators import sys import os -import marshal -import cStringIO import cPickle from cvs2svn_lib.boolean import * +from cvs2svn_lib.common import DB_OPEN_READ +from cvs2svn_lib.common import DB_OPEN_WRITE +from cvs2svn_lib.common import DB_OPEN_NEW from cvs2svn_lib.common import warning_prefix from cvs2svn_lib.common import error_prefix -from cvs2svn_lib.primed_pickle import get_memos -from cvs2svn_lib.primed_pickle import PrimedPickler -from cvs2svn_lib.primed_pickle import PrimedUnpickler +from cvs2svn_lib.log import Log +from cvs2svn_lib.record_table import FileOffsetPacker +from cvs2svn_lib.record_table import RecordTable # DBM module selection @@ -70,12 +71,6 @@ if hasattr(anydbm._defaultmod, 'bsddb') anydbm._defaultmod = gdbm -# Always use these constants for opening databases. -DB_OPEN_READ = 'r' -DB_OPEN_WRITE = 'w' -DB_OPEN_NEW = 'n' - - class AbstractDatabase: """An abstract base class for anydbm-based databases.""" @@ -104,7 +99,7 @@ class AbstractDatabase: # *values*, because our derived classes define __getitem__ and # __setitem__ to override the storage of values, and grabbing # methods directly from the dbm object would bypass this. - for meth_name in ('__delitem__', 'keys', + for meth_name in ('__delitem__', '__iter__', 'has_key', '__contains__', 'iterkeys', 'clear'): meth_ref = getattr(self.db, meth_name, None) if meth_ref: @@ -115,6 +110,9 @@ class AbstractDatabase: # this method provides a fallback definition via explicit delegation: del self.db[key] + def keys(self): + return self.db.keys() + def __iter__(self): for key in self.keys(): yield key @@ -148,78 +146,161 @@ class AbstractDatabase: except KeyError: return default + def close(self): + self.db.close() + self.db = None -class SDatabase(AbstractDatabase): - """A database that can only store strings.""" - def __getitem__(self, key): - return self.db[key] +class Database(AbstractDatabase): + """A database that uses a Serializer to store objects of a certain type. - def __setitem__(self, key, value): - self.db[key] = value + Since the database entry with the key self.serializer_key is used to + store the serializer, self.serializer_key may not be used as a key for + normal entries.""" + serializer_key = '_.%$1\t;_ ' -class Database(AbstractDatabase): - """A database that uses the marshal module to store built-in types.""" + def __init__(self, filename, mode, serializer=None): + """Constructor. + + The database stores its Serializer, so none needs to be supplied + when opening an existing database.""" + + AbstractDatabase.__init__(self, filename, mode) + + if mode == DB_OPEN_NEW: + self.serializer = serializer + self.db[self.serializer_key] = cPickle.dumps(self.serializer) + else: + self.serializer = cPickle.loads(self.db[self.serializer_key]) def __getitem__(self, key): - return marshal.loads(self.db[key]) + return self.serializer.loads(self.db[key]) def __setitem__(self, key, value): - self.db[key] = marshal.dumps(value) + self.db[key] = self.serializer.dumps(value) + def keys(self): # TODO: Once needed, handle iterators as well. + retval = self.db.keys() + retval.remove(self.serializer_key) + return retval + + +class IndexedDatabase: + """A file of objects that are written sequentially and read randomly. + + The objects are indexed by small non-negative integers, and a + RecordTable is used to store the index -> fileoffset map. + fileoffset=0 is used to represent an empty record. (An offset of 0 + cannot occur for a legitimate record because the serializer is + written there.) + + The main file consists of a sequence of pickles (or other serialized + data format). The zeroth record is a pickled Serializer. + Subsequent ones are objects serialized using the serializer. The + offset of each object in the file is stored to an index table so + that the data can later be retrieved randomly. + + Objects are always stored to the end of the file. If an object is + deleted or overwritten, the fact is recorded in the index_table but + the space in the pickle file is not garbage collected. This has the + advantage that one can create a modified version of a database that + shares the main data file with an old version by copying the index + file. But it has the disadvantage that space is wasted whenever + objects are written multiple times.""" + + def __init__(self, filename, index_filename, mode, serializer=None): + """Initialize an IndexedDatabase, writing the serializer if necessary. + + SERIALIZER is only used if MODE is DB_OPEN_NEW; otherwise the + serializer is read from the file.""" + + self.filename = filename + self.index_filename = index_filename + self.mode = mode + if self.mode == DB_OPEN_NEW: + self.f = open(self.filename, 'wb+') + elif self.mode == DB_OPEN_WRITE: + self.f = open(self.filename, 'rb+') + elif self.mode == DB_OPEN_READ: + self.f = open(self.filename, 'rb') + else: + raise RuntimeError('Invalid mode %r' % self.mode) -class PDatabase(AbstractDatabase): - """A database that uses the cPickle module to store arbitrary objects.""" + self.index_table = RecordTable( + self.index_filename, self.mode, FileOffsetPacker()) - def __getitem__(self, key): - return cPickle.loads(self.db[key]) + if self.mode == DB_OPEN_NEW: + assert serializer is not None + self.serializer = serializer + cPickle.dump(self.serializer, self.f, -1) + else: + # Read the memo from the first pickle: + self.serializer = cPickle.load(self.f) - def __setitem__(self, key, value): - self.db[key] = cPickle.dumps(value, -1) + def __setitem__(self, index, item): + """Write ITEM into the database indexed by INDEX.""" + # Make sure we're at the end of the file: + self.f.seek(0, 2) + self.index_table[index] = self.f.tell() + self.serializer.dumpf(self.f, item) + + def _fetch(self, offset): + self.f.seek(offset) + return self.serializer.loadf(self.f) -class PrimedPDatabase(AbstractDatabase): - """A database that uses cPickle module to store arbitrary objects. + def iterkeys(self): + return self.index_table.iterkeys() - The Pickler and Unpickler are 'primed' by pre-pickling PRIMER, which - can be an arbitrary object (e.g., a list of objects that are - expected to occur frequently in the database entries). From then - on, if objects within individual database entries are recognized - from PRIMER, then only their persistent IDs need to be pickled - instead of the whole object. - - Concretely, when a new database is created, the pickler memo and - unpickler memo for PRIMER are computed, pickled, and stored in - db[self.primer_key] as a tuple. When an existing database is opened - for reading or update, the pickler and unpickler memos are read from - db[self.primer_key]. In either case, these memos are used to - initialize a PrimedPickler and PrimedUnpickler, which are used for - future write and read accesses respectively. - - Since the database entry with key self.primer_key is used to store - the memo, self.primer_key may not be used as a key for normal - entries.""" + def itervalues(self): + for offset in self.index_table.itervalues(): + yield self._fetch(offset) - primer_key = '_' + def __getitem__(self, index): + offset = self.index_table[index] + return self._fetch(offset) - def __init__(self, filename, mode, primer): - AbstractDatabase.__init__(self, filename, mode) + def get(self, item, default=None): + try: + return self[item] + except KeyError: + return default - if mode == DB_OPEN_NEW: - pickler_memo, unpickler_memo = get_memos(primer) - self.db[self.primer_key] = \ - cPickle.dumps((pickler_memo, unpickler_memo,)) - else: - (pickler_memo, unpickler_memo,) = \ - cPickle.loads(self.db[self.primer_key]) - self.primed_pickler = PrimedPickler(pickler_memo) - self.primed_unpickler = PrimedUnpickler(unpickler_memo) + def get_many(self, indexes): + """Generate the items with the specified INDEXES in arbitrary order.""" - def __getitem__(self, key): - return self.primed_unpickler.loads(self.db[key]) + offsets = list(self.index_table.get_many(indexes)) + # Sort the offsets to reduce disk seeking: + offsets.sort() + for offset in offsets: + yield self._fetch(offset) - def __setitem__(self, key, value): - self.db[key] = self.primed_pickler.dumps(value) + def __delitem__(self, index): + self.index_table[index] + self.index_table[index] = 0 + + def close(self): + self.index_table.close() + self.index_table = None + self.f.close() + self.f = None + + def __str__(self): + return 'IndexedDatabase(%r)' % (self.filename,) + + +class IndexedStore(IndexedDatabase): + """A file of items that is written sequentially and read randomly. + + This is just like IndexedDatabase, except that it has an additional + add() method which assumes that the object to be written to the + database has an 'id' member, which is used as its database index. + See IndexedDatabase for more information.""" + + def add(self, item): + """Write ITEM into the database indexed by ITEM.id.""" + + self[item.id] = item diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/dumpfile_delegate.py cvs2svn-2.0.0/cvs2svn_lib/dumpfile_delegate.py --- cvs2svn-1.5.x/cvs2svn_lib/dumpfile_delegate.py 2006-10-03 16:43:32.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/dumpfile_delegate.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -26,7 +26,7 @@ from cvs2svn_lib.common import CommandEr from cvs2svn_lib.common import FatalError from cvs2svn_lib.common import OP_ADD from cvs2svn_lib.common import OP_CHANGE -from cvs2svn_lib.common import to_utf8 +from cvs2svn_lib.context import Ctx from cvs2svn_lib.svn_repository_mirror import SVNRepositoryMirrorDelegate @@ -35,7 +35,7 @@ class DumpfileDelegate(SVNRepositoryMirr def __init__(self, dumpfile_path): """Return a new DumpfileDelegate instance, attached to a dumpfile - DUMPFILE_PATH, using to_utf8().""" + DUMPFILE_PATH, using Ctx().filename_utf8_encoder().""" self.dumpfile_path = dumpfile_path @@ -60,11 +60,12 @@ class DumpfileDelegate(SVNRepositoryMirr try: # Log messages can be converted with the 'replace' strategy, # but we can't afford any lossiness here. - pieces[i] = to_utf8(pieces[i], strict=True) + pieces[i] = Ctx().filename_utf8_encoder(pieces[i]) except UnicodeError: raise FatalError( "Unable to convert a path '%s' to internal encoding.\n" - "Consider rerunning with one or more '--encoding' parameters." + "Consider rerunning with one or more '--encoding' parameters or\n" + "with '--fallback-encoding'." % (path,)) return '/'.join(pieces) @@ -204,10 +205,25 @@ class DumpfileDelegate(SVNRepositoryMirr prop_contents = '' props_header = '' + # If the file has keywords, we must prevent CVS/RCS from expanding + # the keywords because they must be unexpanded in the repository, + # or Subversion will get confused. + stream = Ctx().revision_reader.get_content_stream( + cvs_rev, suppress_keyword_substitution=s_item.has_keywords()) + + # Insert a filter to convert all EOLs to LFs if neccessary + if s_item.needs_eol_filter(): + data_reader = LF_EOL_Filter(stream) + else: + data_reader = stream + + buf = None + # treat .cvsignore as a directory property - dir_path, basename = os.path.split(cvs_rev.svn_path) + dir_path, basename = os.path.split(cvs_rev.get_svn_path()) if basename == ".cvsignore": - ignore_vals = generate_ignores(cvs_rev) + buf = data_reader.read() + ignore_vals = generate_ignores(buf) ignore_contents = '\n'.join(ignore_vals) if ignore_contents: ignore_contents += '\n' @@ -227,18 +243,12 @@ class DumpfileDelegate(SVNRepositoryMirr % (self._utf8_path(dir_path), ignore_len, ignore_len, ignore_contents)) - # If the file has keywords, we must prevent CVS/RCS from expanding - # the keywords because they must be unexpanded in the repository, - # or Subversion will get confused. - pipe_cmd, pipe = cvs_rev.cvs_file.project.cvs_repository.get_co_pipe( - cvs_rev, suppress_keyword_substitution=s_item.has_keywords) - self.dumpfile.write('Node-path: %s\n' 'Node-kind: file\n' 'Node-action: %s\n' '%s' # no property header if no props 'Text-content-length: ' - % (self._utf8_path(cvs_rev.svn_path), + % (self._utf8_path(cvs_rev.get_svn_path()), action, props_header)) pos = self.dumpfile.tell() @@ -251,28 +261,18 @@ class DumpfileDelegate(SVNRepositoryMirr if prop_contents: self.dumpfile.write(prop_contents) - # Insert a filter to convert all EOLs to LFs if neccessary - if s_item.needs_eol_filter: - data_reader = LF_EOL_Filter(pipe.stdout) - else: - data_reader = pipe.stdout - # Insert the rev contents, calculating length and checksum as we go. checksum = md5.new() length = 0 - while True: + if buf is None: buf = data_reader.read(config.PIPE_READ_SIZE) - if buf == '': - break + while buf != '': checksum.update(buf) length += len(buf) self.dumpfile.write(buf) + buf = data_reader.read(config.PIPE_READ_SIZE) - pipe.stdout.close() - error_output = pipe.stderr.read() - exit_status = pipe.wait() - if exit_status: - raise CommandError(pipe_cmd, exit_status, error_output) + stream.close() # Go back to patch up the length and checksum headers: self.dumpfile.seek(pos, 0) @@ -305,6 +305,10 @@ class DumpfileDelegate(SVNRepositoryMirr self._add_or_change_path(s_item, OP_CHANGE) + def skip_path(self, cvs_rev): + """Ensure that the unneeded revisions are accounted for as well.""" + Ctx().revision_reader.skip_content(cvs_rev) + def delete_path(self, path): """Emit the deletion of PATH.""" @@ -333,34 +337,14 @@ class DumpfileDelegate(SVNRepositoryMirr self.dumpfile.close() -def generate_ignores(cvs_rev): - # Read in props - pipe_cmd, pipe = \ - cvs_rev.cvs_file.project.cvs_repository.get_co_pipe(cvs_rev) - buf = pipe.stdout.read(config.PIPE_READ_SIZE) - raw_ignore_val = "" - while buf: - raw_ignore_val += buf - buf = pipe.stdout.read(config.PIPE_READ_SIZE) - pipe.stdout.close() - error_output = pipe.stderr.read() - exit_status = pipe.wait() - if exit_status: - raise CommandError(pipe_cmd, exit_status, error_output) - - # Tweak props: First, convert any spaces to newlines... - raw_ignore_val = '\n'.join(raw_ignore_val.split()) - raw_ignores = raw_ignore_val.split('\n') +def generate_ignores(raw_ignore_val): ignore_vals = [ ] - for ignore in raw_ignores: + for ignore in raw_ignore_val.split(): # Reset the list if we encounter a '!' # See http://cvsbook.red-bean.com/cvsbook.html#cvsignore if ignore == '!': ignore_vals = [ ] - continue - # Skip empty lines - if len(ignore) == 0: - continue + else: ignore_vals.append(ignore) return ignore_vals @@ -374,7 +358,7 @@ class LF_EOL_Filter: self.carry_cr = False self.eof = False - def read(self, size): + def read(self, size=-1): while True: buf = self.stream.read(size) self.eof = len(buf) == 0 diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/fill_source.py cvs2svn-2.0.0/cvs2svn_lib/fill_source.py --- cvs2svn-1.5.x/cvs2svn_lib/fill_source.py 2006-09-02 10:44:50.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/fill_source.py 1970-01-01 01:00:00.000000000 +0100 @@ -1,57 +0,0 @@ -# (Be in -*- python -*- mode.) -# -# ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. -# -# This software is licensed as described in the file COPYING, which -# you should have received as part of this distribution. The terms -# are also available at http://subversion.tigris.org/license-1.html. -# If newer versions of this license are posted there, you may use a -# newer version instead, at your option. -# -# This software consists of voluntary contributions made by many -# individuals. For exact contribution history, see the revision -# history and logs, available at http://cvs2svn.tigris.org/. -# ==================================================================== - -"""This module contains the FillSource class.""" - - -from cvs2svn_lib.boolean import * -from cvs2svn_lib.context import Ctx - - -class FillSource: - """Representation of a fill source used by the symbol filler in - SVNRepositoryMirror.""" - - def __init__(self, project, prefix, node): - """Create an unscored fill source with a prefix and a key.""" - - self.project = project - self.prefix = prefix - self.node = node - self.score = None - self.revnum = None - - def set_score(self, score, revnum): - """Set the SCORE and REVNUM.""" - - self.score = score - self.revnum = revnum - - def __cmp__(self, other): - """Comparison operator used to sort FillSources in descending - score order. If the scores are the same, prefer trunk, - or alphabetical order by path - these cases are mostly - useful to stabilize testsuite results.""" - - if self.score is None or other.score is None: - raise TypeError, 'Tried to compare unscored FillSource' - - return cmp(other.score, self.score) \ - or cmp(other.prefix == self.project.trunk_path, - self.prefix == self.project.trunk_path) \ - or cmp(self.prefix, other.prefix) - - diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/last_symbolic_name_database.py cvs2svn-2.0.0/cvs2svn_lib/last_symbolic_name_database.py --- cvs2svn-1.5.x/cvs2svn_lib/last_symbolic_name_database.py 2006-09-10 16:36:26.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/last_symbolic_name_database.py 1970-01-01 01:00:00.000000000 +0100 @@ -1,68 +0,0 @@ -# (Be in -*- python -*- mode.) -# -# ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. -# -# This software is licensed as described in the file COPYING, which -# you should have received as part of this distribution. The terms -# are also available at http://subversion.tigris.org/license-1.html. -# If newer versions of this license are posted there, you may use a -# newer version instead, at your option. -# -# This software consists of voluntary contributions made by many -# individuals. For exact contribution history, see the revision -# history and logs, available at http://cvs2svn.tigris.org/. -# ==================================================================== - -"""This module contains database facilities used by cvs2svn.""" - - -from cvs2svn_lib.boolean import * -from cvs2svn_lib import config -from cvs2svn_lib.common import OP_DELETE -from cvs2svn_lib.context import Ctx -from cvs2svn_lib.artifact_manager import artifact_manager -from cvs2svn_lib.database import Database -from cvs2svn_lib.database import DB_OPEN_NEW - - -class LastSymbolicNameDatabase: - """Passing every CVSRevision in s-revs to this class will result in - a Database whose key is the last CVS Revision a symbolicname was - seen in, and whose value is a list of all symbolicnames that were - last seen in that revision.""" - - def __init__(self): - # A map { symbol_id : cvs_rev.id } of the chronologically last - # CVSRevision that had the symbol as a tag or branch. Once we've - # gone through all the revs, symbols.keys() will be a list of all - # tag and branch symbol_ids, and their corresponding values will - # be the id of the last CVS revision that they were used in. - self._symbols = {} - - def log_revision(self, cvs_rev): - """Gather last CVS Revision for symbolic name info and tag info.""" - - for tag_id in cvs_rev.tag_ids: - self._symbols[tag_id] = cvs_rev.id - if cvs_rev.op != OP_DELETE: - for branch_id in cvs_rev.branch_ids: - self._symbols[branch_id] = cvs_rev.id - - def create_database(self): - """Create the SYMBOL_LAST_CVS_REVS_DB. - - The database will hold an inversion of symbols above--a map { - cvs_rev.id : [ symbol, ... ] of symbols that close in each - CVSRevision.""" - - symbol_revs_db = Database( - artifact_manager.get_temp_file(config.SYMBOL_LAST_CVS_REVS_DB), - DB_OPEN_NEW) - for symbol_id, rev_id in self._symbols.items(): - rev_key = '%x' % (rev_id,) - ary = symbol_revs_db.get(rev_key, []) - ary.append(symbol_id) - symbol_revs_db[rev_key] = ary - - diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/line_of_development.py cvs2svn-2.0.0/cvs2svn_lib/line_of_development.py --- cvs2svn-1.5.x/cvs2svn_lib/line_of_development.py 2006-09-10 14:55:15.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/line_of_development.py 1970-01-01 01:00:00.000000000 +0100 @@ -1,60 +0,0 @@ -# (Be in -*- python -*- mode.) -# -# ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. -# -# This software is licensed as described in the file COPYING, which -# you should have received as part of this distribution. The terms -# are also available at http://subversion.tigris.org/license-1.html. -# If newer versions of this license are posted there, you may use a -# newer version instead, at your option. -# -# This software consists of voluntary contributions made by many -# individuals. For exact contribution history, see the revision -# history and logs, available at http://cvs2svn.tigris.org/. -# ==================================================================== - -"""This module contains classes to store CVS branches.""" - - -from cvs2svn_lib.boolean import * -from cvs2svn_lib.context import Ctx - - -class LineOfDevelopment: - """Base class for Trunk and Branch.""" - - def make_path(self, cvs_file): - raise NotImplementedError() - - -class Trunk(LineOfDevelopment): - """Represent the main line of development.""" - - def __init__(self): - pass - - def make_path(self, cvs_file): - return cvs_file.project.make_trunk_path(cvs_file.cvs_path) - - def __str__(self): - """For convenience only. The format is subject to change at any time.""" - - return 'Trunk' - - -class Branch(LineOfDevelopment): - """An object that describes a CVS branch.""" - - def __init__(self, symbol): - self.symbol = symbol - - def make_path(self, cvs_file): - return cvs_file.project.make_branch_path(self.symbol, cvs_file.cvs_path) - - def __str__(self): - """For convenience only. The format is subject to change at any time.""" - - return 'Branch %r <%x>' % (self.symbol.name, self.symbol.id,) - - diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/log.py cvs2svn-2.0.0/cvs2svn_lib/log.py --- cvs2svn-1.5.x/cvs2svn_lib/log.py 2006-08-26 17:28:07.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/log.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -19,6 +19,7 @@ import sys import time +import threading from cvs2svn_lib.boolean import * @@ -26,8 +27,16 @@ from cvs2svn_lib.boolean import * class Log: """A Simple logging facility. - Each line will be timestamped if self.use_timestamps is True. This - class is a Borg, see + If self.log_level is DEBUG or higher, each line will be timestamped + with the number of wall-clock seconds since the time when this + module was first imported. + + If self.use_timestamps is True, each line will be timestamped with a + human-readable clock time. + + The public methods of this class are thread-safe. + + This class is a Borg; see http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66531.""" # These constants represent the log levels that this class supports. @@ -37,6 +46,9 @@ class Log: QUIET = 0 NORMAL = 1 VERBOSE = 2 + DEBUG = 3 + + start_time = time.time() __shared_state = {} @@ -48,50 +60,86 @@ class Log: # Set this to True if you want to see timestamps on each line output. self.use_timestamps = False self.logger = sys.stdout + # Lock to serialize writes to the log: + self.lock = threading.Lock() def increase_verbosity(self): - self.log_level = min(self.log_level + 1, self.VERBOSE) + self.lock.acquire() + try: + self.log_level = min(self.log_level + 1, Log.DEBUG) + finally: + self.lock.release() def decrease_verbosity(self): - self.log_level = max(self.log_level - 1, self.WARN) + self.lock.acquire() + try: + self.log_level = max(self.log_level - 1, Log.WARN) + finally: + self.lock.release() + + def is_on(self, level): + """Return True iff messages at the specified LEVEL are currently on. + + LEVEL should be one of the constants Log.WARN, Log.QUIET, etc.""" + + return self.log_level >= level def _timestamp(self): - """Output a detailed timestamp at the beginning of each line output.""" + """Return a timestamp if needed.""" - self.logger.write(time.strftime('[%Y-%m-%d %I:%m:%S %Z] - ')) + retval = [] - def write(self, log_level, *args): - """This is the public method to use for writing to a file. Only - messages whose LOG_LEVEL is <= self.log_level will be printed. If - there are multiple ARGS, they will be separated by a space.""" + if self.log_level >= Log.DEBUG: + retval.append('%f:' % (time.time() - self.start_time,)) - if log_level > self.log_level: - return if self.use_timestamps: - self._timestamp() - self.logger.write(' '.join(map(str,args)) + "\n") + retval.append(time.strftime('[%Y-%m-%d %I:%M:%S %Z] -')) + + return retval + + def write(self, *args): + """Write a message to the log. + + This is the public method to use for writing to a file. If there + are multiple ARGS, they will be separated by spaces.""" + + self.lock.acquire() + try: + self.logger.write(' '.join(self._timestamp() + map(str, args)) + "\n") # Ensure that log output doesn't get out-of-order with respect to # stderr output. self.logger.flush() + finally: + self.lock.release() def warn(self, *args): """Log a message at the WARN level.""" - self.write(self.WARN, *args) + if self.is_on(Log.WARN): + self.write(*args) def quiet(self, *args): """Log a message at the QUIET level.""" - self.write(self.QUIET, *args) + if self.is_on(Log.QUIET): + self.write(*args) def normal(self, *args): """Log a message at the NORMAL level.""" - self.write(self.NORMAL, *args) + if self.is_on(Log.NORMAL): + self.write(*args) def verbose(self, *args): """Log a message at the VERBOSE level.""" - self.write(self.VERBOSE, *args) + if self.is_on(Log.VERBOSE): + self.write(*args) + + def debug(self, *args): + """Log a message at the DEBUG level.""" + + if self.is_on(Log.DEBUG): + self.write(*args) diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/main.py cvs2svn-2.0.0/cvs2svn_lib/main.py --- cvs2svn-1.5.x/cvs2svn_lib/main.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/main.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,104 @@ +#!/usr/bin/env python +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2000-2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +import os +import errno +import gc + +try: + # Try to get access to a bunch of encodings for use with --encoding. + # See http://cjkpython.i18n.org/ for details. + import iconv_codec +except ImportError: + pass + +from cvs2svn_lib.boolean import * +from cvs2svn_lib import config +from cvs2svn_lib.common import FatalException +from cvs2svn_lib.common import FatalError +from cvs2svn_lib.run_options import RunOptions +from cvs2svn_lib.context import Ctx +from cvs2svn_lib.pass_manager import PassManager +from cvs2svn_lib.passes import passes + + +def main(progname, cmd_args): + # Disable garbage collection, as we try not to create any circular + # data structures: + gc.disable() + + # Convenience var, so we don't have to keep instantiating this Borg. + ctx = Ctx() + + pass_manager = PassManager(passes) + + run_options = RunOptions(progname, cmd_args, pass_manager) + + # Make sure the tmp directory exists. Note that we don't check if + # it's empty -- we want to be able to use, for example, "." to hold + # tempfiles. But if we *did* want check if it were empty, we'd do + # something like os.stat(ctx.tmpdir)[stat.ST_NLINK], of course :-). + if not os.path.exists(ctx.tmpdir): + erase_tmpdir = True + os.mkdir(ctx.tmpdir) + elif not os.path.isdir(ctx.tmpdir): + raise FatalError( + "cvs2svn tried to use '%s' for temporary files, but that path\n" + " exists and is not a directory. Please make it be a directory,\n" + " or specify some other directory for temporary files." + % (ctx.tmpdir,)) + else: + erase_tmpdir = False + + # But do lock the tmpdir, to avoid process clash. + try: + os.mkdir(os.path.join(ctx.tmpdir, 'cvs2svn.lock')) + except OSError, e: + if e.errno == errno.EACCES: + raise FatalError("Permission denied:" + + " No write access to directory '%s'." % ctx.tmpdir) + if e.errno == errno.EEXIST: + raise FatalError( + "cvs2svn is using directory '%s' for temporary files, but\n" + " subdirectory '%s/cvs2svn.lock' exists, indicating that another\n" + " cvs2svn process is currently using '%s' as its temporary\n" + " workspace. If you are certain that is not the case,\n" + " then remove the '%s/cvs2svn.lock' subdirectory." + % (ctx.tmpdir, ctx.tmpdir, ctx.tmpdir, ctx.tmpdir,)) + raise + + try: + if run_options.profiling: + import hotshot + prof = hotshot.Profile('cvs2svn.hotshot') + prof.runcall( + pass_manager.run, run_options.start_pass, run_options.end_pass) + prof.close() + else: + pass_manager.run(run_options.start_pass, run_options.end_pass) + finally: + try: + os.rmdir(os.path.join(ctx.tmpdir, 'cvs2svn.lock')) + except: + pass + + if erase_tmpdir: + try: + os.rmdir(ctx.tmpdir) + except: + pass + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/metadata_database.py cvs2svn-2.0.0/cvs2svn_lib/metadata_database.py --- cvs2svn-1.5.x/cvs2svn_lib/metadata_database.py 2006-09-10 10:28:47.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/metadata_database.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -21,26 +21,35 @@ import sha from cvs2svn_lib.boolean import * from cvs2svn_lib import config +from cvs2svn_lib.common import DB_OPEN_READ +from cvs2svn_lib.common import DB_OPEN_WRITE +from cvs2svn_lib.common import DB_OPEN_NEW from cvs2svn_lib.context import Ctx +from cvs2svn_lib.log import Log from cvs2svn_lib.artifact_manager import artifact_manager from cvs2svn_lib.database import Database from cvs2svn_lib.key_generator import KeyGenerator +from cvs2svn_lib.serializer import MarshalSerializer class MetadataDatabase: """A Database to store metadata about CVSRevisions. - This database has two types of entries: + This database manages a map - digest -> id + id -> (project.id, author, log_msg,) - hex(id) -> (project.id, author, log_msg,) + where id is a unique identifier for a set of metadata. - Digest is the digest of the string (project.id + '\0' + author + - '\0' + log_msg) (or, if Ctx().cross_project_commits is True, (author - + '\0' + log_msg)), and is used to locate matching records - efficiently. id is a unique id for each record (as a hex string - ('%x' % id) when used as a key). + When the MetadataDatabase is opened in DB_OPEN_NEW mode, the mapping + + { (project, branch_name, author, log_msg) -> id } + + is also available. If the requested set of metadata has never been + seen before, a new record is created and its id is returned. This + is done by creating an SHA digest of a string containing author, + log_message, and possible project_id and/or branch_name, then + looking up the digest in the _digest_to_id map. """ @@ -49,36 +58,62 @@ class MetadataDatabase: argument to Database or anydbm.open()). Use CVS_FILE_DB to look up CVSFiles.""" - self.key_generator = KeyGenerator(1) - self.db = Database(artifact_manager.get_temp_file(config.METADATA_DB), - mode) + self.mode = mode - def get_key(self, project, author, log_msg): - """Return the id for the record for (PROJECT, AUTHOR, LOG_MSG,). + if self.mode == DB_OPEN_NEW: + # A map { digest : id }: + self._digest_to_id = {} + # A key_generator to generate keys for metadata that haven't + # been seen yet: + self.key_generator = KeyGenerator(1) + elif self.mode == DB_OPEN_READ: + # In this case, we don't need key_generator or _digest_to_id. + pass + elif self.mode == DB_OPEN_WRITE: + # Modifying an existing database is not supported: + raise NotImplementedError('Mode %r is not supported' % self.mode) + + self.db = Database( + artifact_manager.get_temp_file(config.METADATA_DB), self.mode, + MarshalSerializer()) + + def get_key(self, project, branch_name, author, log_msg): + """Return the id for the specified metadata. + + Locate the record for a commit with the specified (PROJECT, + BRANCH_NAME, AUTHOR, LOG_MSG). (Depending on policy, not all of + these items are necessarily used when creating the unique id.) If there is no such record, create one and return its newly-generated id.""" - if Ctx().cross_project_commits: - s = '%s\0%s' % (author, log_msg) - else: - s = '%x\0%s\0%s' % (project.id, author, log_msg) + key = [author, log_msg] + if not Ctx().cross_project_commits: + key.append('%x' % project.id) + if not Ctx().cross_branch_commits: + key.append(branch_name or '') - digest = sha.new(s).hexdigest() + digest = sha.new('\0'.join(key)).digest() try: # See if it is already known: - return self.db[digest] + return self._digest_to_id[digest] except KeyError: pass id = self.key_generator.gen_id() - self.db['%x' % id] = (project.id, author, log_msg,) - self.db[digest] = id + self._digest_to_id[digest] = id + self.db['%x' % id] = (author, log_msg,) return id def __getitem__(self, id): """Return (author, log_msg,) for ID.""" - return self.db['%x' % (id,)][1:] + return self.db['%x' % (id,)] + + def close(self): + if self.mode == DB_OPEN_NEW: + self._digest_to_id = None + self.db.close() + self.db = None diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/openings_closings.py cvs2svn-2.0.0/cvs2svn_lib/openings_closings.py --- cvs2svn-1.5.x/cvs2svn_lib/openings_closings.py 2006-09-10 16:36:26.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/openings_closings.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -14,17 +14,21 @@ # history and logs, available at http://cvs2svn.tigris.org/. # ==================================================================== -"""This module contains database facilities used by cvs2svn.""" +"""This module contains classes to keep track of symbol openings/closings.""" +from __future__ import generators + +import cPickle + from cvs2svn_lib.boolean import * from cvs2svn_lib import config -from cvs2svn_lib.common import OP_DELETE +from cvs2svn_lib.common import InternalError from cvs2svn_lib.context import Ctx from cvs2svn_lib.artifact_manager import artifact_manager -from cvs2svn_lib.database import DB_OPEN_READ -from cvs2svn_lib.line_of_development import Branch +from cvs2svn_lib.symbol import Branch from cvs2svn_lib.svn_revision_range import SVNRevisionRange +from cvs2svn_lib.symbol_filling_guide import get_source_set # Constants used in SYMBOL_OPENINGS_CLOSINGS @@ -37,158 +41,208 @@ class SymbolingsLogger: This data will later be used to determine valid SVNRevision ranges from which a file can be copied when creating a branch or tag in - Subversion. Do this by finding "Openings" and "Closings" for each + Subversion. Do this by finding 'Openings' and 'Closings' for each file copied onto a branch or tag. - An "Opening" is the CVSRevision from which a given branch/tag - sprouts on a path. + An 'Opening' is the beginning of the lifetime of the source + (CVSRevision or CVSBranch) from which a given CVSSymbol sprouts. - The "Closing" for that branch/tag and path is the next CVSRevision - on the same line of development as the opening. + The 'Closing' is the SVN revision when the source is deleted or + overwritten. For example, on file 'foo.c', branch BEE has branch number 1.2.2 and - obviously sprouts from revision 1.2. Therefore, 1.2 is the opening - for BEE on path 'foo.c', and 1.3 is the closing for BEE on path - 'foo.c'. Note that there may be many revisions chronologically + obviously sprouts from revision 1.2. Therefore, the SVN revision + when 1.2 is committed is the opening for BEE on path 'foo.c', and + the SVN revision when 1.3 is committed is the closing for BEE on + path 'foo.c'. Note that there may be many revisions chronologically between 1.2 and 1.3, for example, revisions on branches of 'foo.c', perhaps even including on branch BEE itself. But 1.3 is the next revision *on the same line* as 1.2, that is why it is the closing revision for those symbolic names of which 1.2 is the opening. - The reason for doing all this hullabaloo is to make branch and tag - creation as efficient as possible by minimizing the number of copies - and deletes per creation. For example, revisions 1.2 and 1.3 of - foo.c might correspond to revisions 17 and 30 in Subversion. That - means that when creating branch BEE, there is some motivation to do - the copy from one of 17-30. Now if there were another file, - 'bar.c', whose opening and closing CVSRevisions for BEE corresponded - to revisions 24 and 39 in Subversion, we would know that the ideal - thing would be to copy the branch from somewhere between 24 and 29, - inclusive. - """ + The reason for doing all this hullabaloo is (1) to determine what + range of SVN revision numbers can be used as the source of a copy of + a particular file onto a branch/tag, and (2) to minimize the number + of copies and deletes per creation by choosing source SVN revision + numbers that can be used for as many files as possible. + + For example, revisions 1.2 and 1.3 of foo.c might correspond to + revisions 17 and 30 in Subversion. That means that when creating + branch BEE, foo.c has to be copied from a Subversion revision number + in the range 17 <= revnum < 30. Now if there were another file, + 'bar.c', in the same directory, and 'bar.c's opening and closing for + BEE correspond to revisions 24 and 39 in Subversion, then we can + kill two birds with one stone by copying the whole directory from + somewhere in the range 24 <= revnum < 30.""" def __init__(self): self.symbolings = open( artifact_manager.get_temp_file(config.SYMBOL_OPENINGS_CLOSINGS), 'w') - # This keys of this dictionary are *source* cvs_paths for which - # we've encountered an 'opening' on the default branch. The - # values are the ids of symbols that this path has opened. - self._open_paths_with_default_branches = { } - def log_revision(self, cvs_rev, svn_revnum): """Log any openings and closings found in CVS_REV.""" - if isinstance(cvs_rev.lod, Branch): - branch_id = cvs_rev.lod.symbol.id - else: - branch_id = None + for (symbol_id, cvs_symbol_id,) in cvs_rev.opened_symbols: + self._log_opening(symbol_id, cvs_symbol_id, svn_revnum) + + for (symbol_id, cvs_symbol_id) in cvs_rev.closed_symbols: + self._log_closing(symbol_id, cvs_symbol_id, svn_revnum) - for symbol_id in cvs_rev.tag_ids + cvs_rev.branch_ids: - self._note_default_branch_opening(cvs_rev, symbol_id) - if cvs_rev.op != OP_DELETE: - self._log(symbol_id, svn_revnum, cvs_rev.cvs_file, branch_id, OPENING) + def log_branch_revision(self, cvs_branch, svn_revnum): + """Log any openings and closings found in CVS_BRANCH.""" - for symbol_id in cvs_rev.closed_symbol_ids: - self._log(symbol_id, svn_revnum, cvs_rev.cvs_file, branch_id, CLOSING) + for (symbol_id, cvs_symbol_id,) in cvs_branch.opened_symbols: + self._log_opening(symbol_id, cvs_symbol_id, svn_revnum) - def _log(self, symbol_id, svn_revnum, cvs_file, branch_id, type): + def _log(self, symbol_id, cvs_symbol_id, svn_revnum, type): """Log an opening or closing to self.symbolings. Write out a single line to the symbol_openings_closings file - representing that SVN_REVNUM of SVN_FILE on BRANCH_ID is either - the opening or closing (TYPE) of NAME (a symbolic name). - - TYPE should be one of the following constants: OPENING or CLOSING. + representing that SVN_REVNUM is either the opening or closing + (TYPE) of CVS_SYMBOL_ID for SYMBOL_ID. - BRANCH_ID is the symbol id of the branch on which the opening or - closing occurred, or None if the opening/closing occurred on the - default branch.""" + TYPE should be one of the following constants: OPENING or CLOSING.""" - if branch_id is None: - branch_id = '*' - else: - branch_id = '%x' % branch_id self.symbolings.write( - '%x %d %s %s %x\n' - % (symbol_id, svn_revnum, type, branch_id, cvs_file.id)) + '%x %d %s %x\n' % (symbol_id, svn_revnum, type, cvs_symbol_id) + ) + + def _log_opening(self, symbol_id, cvs_symbol_id, svn_revnum): + """Log an opening to self.symbolings. + + See _log() for more information.""" + + self._log(symbol_id, cvs_symbol_id, svn_revnum, OPENING) + + def _log_closing(self, symbol_id, cvs_symbol_id, svn_revnum): + """Log a closing to self.symbolings. + + See _log() for more information.""" + + self._log(symbol_id, cvs_symbol_id, svn_revnum, CLOSING) def close(self): self.symbolings.close() + self.symbolings = None + + +class SymbolingsReader: + """Provides an interface to retrieve symbol openings and closings. - def _note_default_branch_opening(self, cvs_rev, symbol_id): - """If CVS_REV is a default branch revision, log CVS_REV.cvs_path - as an opening for SYMBOLIC_NAME.""" - - self._open_paths_with_default_branches.setdefault( - cvs_rev.cvs_path, []).append(symbol_id) - - def log_default_branch_closing(self, cvs_rev, svn_revnum): - """If self._open_paths_with_default_branches contains - CVS_REV.cvs_path, then call log each symbol in - self._open_paths_with_default_branches[CVS_REV.cvs_path] as a - closing with SVN_REVNUM as the closing revision number.""" - - path = cvs_rev.cvs_path - if path in self._open_paths_with_default_branches: - # log each symbol as a closing - for symbol_id in self._open_paths_with_default_branches[path]: - self._log(symbol_id, svn_revnum, cvs_rev.cvs_file, None, CLOSING) - # Remove them from the openings list as we're done with them. - del self._open_paths_with_default_branches[path] - - -class OpeningsClosingsMap: - """A dictionary of openings and closings for a symbol in the current - SVNCommit. - - The user should call self.register() for the openings and closings, - then self.get_node_tree() to retrieve the information as a - SymbolFillingGuide.""" - - def __init__(self, symbol): - """Initialize OpeningsClosingsMap and prepare it for receiving - openings and closings.""" - - self.symbol = symbol - - # A dictionary of SVN_PATHS to SVNRevisionRange objects. - self.things = { } - - def register(self, svn_path, svn_revnum, type): - """Register an opening or closing revision for this symbolic name. - SVN_PATH is the source path that needs to be copied into - self.symbol, and SVN_REVNUM is either the first svn revision - number that we can copy from (our opening), or the last (not - inclusive) svn revision number that we can copy from (our - closing). TYPE indicates whether this path is an opening or a a - closing. - - The opening for a given SVN_PATH must be passed before the closing - for it to have any effect... any closing encountered before a - corresponding opening will be discarded. + This class accesses the SYMBOL_OPENINGS_CLOSINGS_SORTED file and the + SYMBOL_OFFSETS_DB. Does the heavy lifting of finding and returning + the correct opening and closing Subversion revision numbers for a + given symbolic name and SVN revision number range.""" - It is not necessary to pass a corresponding closing for every - opening.""" + def __init__(self): + """Opens the SYMBOL_OPENINGS_CLOSINGS_SORTED for reading, and + reads the offsets database into memory.""" + + self.symbolings = open( + artifact_manager.get_temp_file( + config.SYMBOL_OPENINGS_CLOSINGS_SORTED), + 'r') + # The offsets_db is really small, and we need to read and write + # from it a fair bit, so suck it into memory + offsets_db = file( + artifact_manager.get_temp_file(config.SYMBOL_OFFSETS_DB), 'rb') + # A map from symbol_id to offset. The values of this map are + # incremented as the openings and closings for a symbol are + # consumed. + self.offsets = cPickle.load(offsets_db) + offsets_db.close() + + def _generate_lines(self, symbol): + """Generate the lines for SYMBOL. + + SYMBOL is a TypedSymbol instance. Yield the tuple (revnum, type, + cvs_symbol_id) for all openings and closings for SYMBOL.""" + + if symbol.id in self.offsets: + # Set our read offset for self.symbolings to the offset for this + # symbol: + self.symbolings.seek(self.offsets[symbol.id]) + + while True: + line = self.symbolings.readline().rstrip() + if not line: + break + (id, revnum, type, cvs_symbol_id) = line.split() + id = int(id, 16) + revnum = int(revnum) + if id != symbol.id: + break + cvs_symbol_id = int(cvs_symbol_id, 16) + + yield (revnum, type, cvs_symbol_id) + + def _get_range_map(self, symbol): + """Return the ranges of all CVSSymbols related to SYMBOL. - # Always log an OPENING + Return a map {cvs_symbol_id : SVNRevisionRange}.""" + + range_map = {} + + for (revnum, type, cvs_symbol_id) in self._generate_lines(symbol): + range = range_map.get(cvs_symbol_id) if type == OPENING: - self.things[svn_path] = SVNRevisionRange(svn_revnum) - # Only log a closing if we've already registered the opening for that - # path. - elif type == CLOSING and svn_path in self.things: - self.things[svn_path].add_closing(svn_revnum) - - def is_empty(self): - """Return true if we haven't accumulated any openings or closings, - false otherwise.""" - - return not len(self.things) - - def get_things(self): - """Return a list of (svn_path, SVNRevisionRange) tuples for all - svn_paths with registered openings or closings.""" + if range is not None: + raise InternalError( + 'Multiple openings logged for %r' + % (Ctx()._cvs_items_db[cvs_symbol_id],) + ) + range_map[cvs_symbol_id] = SVNRevisionRange(revnum) + else: + if range is None: + raise InternalError( + 'Closing precedes opening for %r' + % (Ctx()._cvs_items_db[cvs_symbol_id],) + ) + if range.closing_revnum is not None: + raise InternalError( + 'Multiple closings logged for %r' + % (Ctx()._cvs_items_db[cvs_symbol_id],) + ) + range.add_closing(revnum) + + return range_map + + def get_source_set(self, svn_symbol_commit, svn_revnum): + """Return the list of possible sources for SVN_SYMBOL_COMMIT. + + SVN_SYMBOL_COMMIT is an SVNSymbolCommit instance and SVN_REVNUM is + an SVN revision number. The symbol sources will contain all + openings and closings for CVSSymbols that occur in + SVN_SYMBOL_COMMIT. SVN_REVNUM is only used for an internal + consistency check.""" + + symbol = svn_symbol_commit.symbol + + range_map = self._get_range_map(svn_symbol_commit.symbol) + + # A map {svn_path : SVNRevisionRange}: + openings_closings_map = {} + + for cvs_symbol in svn_symbol_commit.get_cvs_items(): + svn_path = cvs_symbol.source_lod.get_path( + cvs_symbol.cvs_file.cvs_path + ) + try: + range = range_map[cvs_symbol.id] + except KeyError: + raise InternalError('No opening for %s' % (cvs_symbol,)) + + if range.opening_revnum >= svn_revnum: + raise InternalError( + 'Opening in r%d not ready for %s in r%d' + % (range.opening_revnum, cvs_symbol, svn_revnum,) + ) + + if range.closing_revnum > svn_revnum: + range.closing_revnum = None + + openings_closings_map[svn_path] = range - return self.things.items() + return get_source_set(svn_symbol_commit.symbol, openings_closings_map) diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/pass_manager.py cvs2svn-2.0.0/cvs2svn_lib/pass_manager.py --- cvs2svn-1.5.x/cvs2svn_lib/pass_manager.py 2006-06-11 22:11:44.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/pass_manager.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -18,6 +18,7 @@ import time +import gc from cvs2svn_lib.boolean import * from cvs2svn_lib import config @@ -35,6 +36,54 @@ class InvalidPassError(FatalError): self, msg + '\nUse --help-passes for more information.') +def check_for_garbage(): + # We've turned off the garbage collector because we shouldn't + # need it (we don't create circular dependencies) and because it + # is therefore a waste of time. So here we check for any + # unreachable objects and generate a debug-level warning if any + # occur: + gc.set_debug(gc.DEBUG_SAVEALL) + gc_count = gc.collect() + if gc_count: + if Log().is_on(Log.DEBUG): + Log().debug( + 'INTERNAL: %d unreachable object(s) were garbage collected:' + % (gc_count,) + ) + for g in gc.garbage: + Log().debug(' %s' % (g,)) + del gc.garbage[:] + + +class Pass(object): + """Base class for one step of the conversion.""" + + def __init__(self): + # By default, use the pass object's class name as the pass name: + self.name = self.__class__.__name__ + + def register_artifacts(self): + """Register artifacts (created and needed) in artifact_manager.""" + + raise NotImplementedError + + def _register_temp_file(self, basename): + """Helper method; for brevity only.""" + + artifact_manager.register_temp_file(basename, self) + + def _register_temp_file_needed(self, basename): + """Helper method; for brevity only.""" + + artifact_manager.register_temp_file_needed(basename, self) + + def run(self, stats_keeper): + """Carry out this step of the conversion. + STATS_KEEPER is a StatsKeeper instance.""" + + raise NotImplementedError + + class PassManager: """Manage a list of passes that can be executed separately or all at once. @@ -121,13 +170,16 @@ class PassManager: artifact_manager.pass_started(the_pass) the_pass.run(stats_keeper) end_time = time.time() - stats_keeper.log_duration_for_pass(end_time - start_time, i + 1) + stats_keeper.log_duration_for_pass( + end_time - start_time, i + 1, the_pass.name) start_time = end_time Ctx().clean() # Allow the artifact manager to clean up artifacts that are no # longer needed: artifact_manager.pass_done(the_pass) + check_for_garbage() + # Tell the artifact manager about passes that are being deferred: for the_pass in self.passes[index_end:]: artifact_manager.pass_deferred(the_pass) diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/passes.py cvs2svn-2.0.0/cvs2svn_lib/passes.py --- cvs2svn-1.5.x/cvs2svn_lib/passes.py 2006-09-16 22:07:20.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/passes.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -14,49 +14,65 @@ # history and logs, available at http://cvs2svn.tigris.org/. # ==================================================================== -"""This module contains database facilities used by cvs2svn.""" +"""This module defines the passes that make up a conversion.""" from __future__ import generators import sys import os +import shutil import cPickle +import bisect from cvs2svn_lib.boolean import * +from cvs2svn_lib.set_support import * from cvs2svn_lib import config from cvs2svn_lib.context import Ctx from cvs2svn_lib.common import FatalException +from cvs2svn_lib.common import FatalError +from cvs2svn_lib.common import InternalError +from cvs2svn_lib.common import DB_OPEN_NEW +from cvs2svn_lib.common import DB_OPEN_READ +from cvs2svn_lib.common import DB_OPEN_WRITE +from cvs2svn_lib.common import Timestamper from cvs2svn_lib.log import Log +from cvs2svn_lib.pass_manager import Pass from cvs2svn_lib.artifact_manager import artifact_manager -from cvs2svn_lib.database import DB_OPEN_NEW -from cvs2svn_lib.database import DB_OPEN_READ -from cvs2svn_lib.database import DB_OPEN_WRITE from cvs2svn_lib.cvs_file_database import CVSFileDatabase from cvs2svn_lib.metadata_database import MetadataDatabase -from cvs2svn_lib.symbol import BranchSymbol -from cvs2svn_lib.symbol import TagSymbol from cvs2svn_lib.symbol import ExcludedSymbol from cvs2svn_lib.symbol_database import SymbolDatabase from cvs2svn_lib.symbol_database import create_symbol_database -from cvs2svn_lib.line_of_development import Branch from cvs2svn_lib.symbol_statistics import SymbolStatistics from cvs2svn_lib.cvs_item import CVSRevision -from cvs2svn_lib.cvs_item_database import NewCVSItemStore -from cvs2svn_lib.cvs_item_database import NewIndexedCVSItemStore +from cvs2svn_lib.cvs_item import CVSSymbol from cvs2svn_lib.cvs_item_database import OldCVSItemStore -from cvs2svn_lib.cvs_item_database import OldIndexedCVSItemStore -from cvs2svn_lib.cvs_revision_resynchronizer import CVSRevisionResynchronizer -from cvs2svn_lib.last_symbolic_name_database import LastSymbolicNameDatabase +from cvs2svn_lib.cvs_item_database import IndexedCVSItemStore +from cvs2svn_lib.key_generator import KeyGenerator +from cvs2svn_lib.changeset import RevisionChangeset +from cvs2svn_lib.changeset import OrderedChangeset +from cvs2svn_lib.changeset import SymbolChangeset +from cvs2svn_lib.changeset import BranchChangeset +from cvs2svn_lib.changeset import create_symbol_changeset +from cvs2svn_lib.changeset_graph import ChangesetGraph +from cvs2svn_lib.changeset_graph_link import ChangesetGraphLink +from cvs2svn_lib.changeset_database import ChangesetDatabase +from cvs2svn_lib.changeset_database import CVSItemToChangesetTable from cvs2svn_lib.svn_commit import SVNCommit +from cvs2svn_lib.svn_commit import SVNRevisionCommit from cvs2svn_lib.openings_closings import SymbolingsLogger -from cvs2svn_lib.cvs_revision_aggregator import CVSRevisionAggregator +from cvs2svn_lib.svn_commit_creator import SVNCommitCreator from cvs2svn_lib.svn_repository_mirror import SVNRepositoryMirror from cvs2svn_lib.svn_commit import SVNInitialProjectCommit from cvs2svn_lib.persistence_manager import PersistenceManager from cvs2svn_lib.stdout_delegate import StdoutDelegate from cvs2svn_lib.collect_data import CollectData from cvs2svn_lib.process import run_command +from cvs2svn_lib.check_dependencies_pass \ + import CheckItemStoreDependenciesPass +from cvs2svn_lib.check_dependencies_pass \ + import CheckIndexedItemStoreDependenciesPass def sort_file(infilename, outfilename, options=''): @@ -67,75 +83,57 @@ def sort_file(infilename, outfilename, o # it to 'C' lc_all_tmp = os.environ.get('LC_ALL', None) os.environ['LC_ALL'] = 'C' + command = '%s -T %s %s %s > %s' % ( + Ctx().sort_executable, Ctx().tmpdir, options, infilename, outfilename + ) try: # The -T option to sort has a nice side effect. The Win32 sort is # case insensitive and cannot be used, and since it does not # understand the -T option and dies if we try to use it, there is # no risk that we use that sort by accident. - run_command('%s -T %s %s %s > %s' - % (Ctx().sort_executable, Ctx().tmpdir, options, - infilename, outfilename)) + run_command(command) finally: if lc_all_tmp is None: del os.environ['LC_ALL'] else: os.environ['LC_ALL'] = lc_all_tmp - -class Pass: - """Base class for one step of the conversion.""" - - def __init__(self): - # By default, use the pass object's class name as the pass name: - self.name = self.__class__.__name__ - - def register_artifacts(self): - """Register artifacts (created and needed) in artifact_manager.""" - - raise NotImplementedError - - def _register_temp_file(self, basename): - """Helper method; for brevity only.""" - - artifact_manager.register_temp_file(basename, self) - - def _register_temp_file_needed(self, basename): - """Helper method; for brevity only.""" - - artifact_manager.register_temp_file_needed(basename, self) - - def run(self, stats_keeper): - """Carry out this step of the conversion. - STATS_KEEPER is a StatsKeeper instance.""" - - raise NotImplementedError + # On some versions of Windows, os.system() does not return an error + # if the command fails. So add a little consistency test here that + # the output file was created and has the right size: + if not os.path.exists(outfilename) \ + or os.path.getsize(outfilename) != os.path.getsize(infilename): + raise FatalError('Command failed: "%s"' % (command,)) class CollectRevsPass(Pass): """This pass was formerly known as pass1.""" def register_artifacts(self): - self._register_temp_file(config.SYMBOL_STATISTICS_LIST) - self._register_temp_file(config.RESYNC_DATAFILE) + self._register_temp_file(config.SYMBOL_STATISTICS) self._register_temp_file(config.METADATA_DB) self._register_temp_file(config.CVS_FILES_DB) self._register_temp_file(config.CVS_ITEMS_STORE) + Ctx().revision_reader.get_revision_recorder().register_artifacts(self) def run(self, stats_keeper): Log().quiet("Examining all CVS ',v' files...") Ctx()._cvs_file_db = CVSFileDatabase(DB_OPEN_NEW) - cd = CollectData(stats_keeper) + cd = CollectData( + Ctx().revision_reader.get_revision_recorder(), stats_keeper) for project in Ctx().projects: cd.process_project(project) - cd.flush() - if cd.fatal_errors: + fatal_errors = cd.close() + + if fatal_errors: raise FatalException("Pass 1 complete.\n" + "=" * 75 + "\n" + "Error summary:\n" - + "\n".join(cd.fatal_errors) + "\n" + + "\n".join(fatal_errors) + "\n" + "Exited due to fatal error(s).\n") + Ctx()._cvs_file_db.close() stats_keeper.reset_cvs_rev_info() stats_keeper.archive() Log().quiet("Done") @@ -146,10 +144,12 @@ class CollateSymbolsPass(Pass): def register_artifacts(self): self._register_temp_file(config.SYMBOL_DB) - self._register_temp_file_needed(config.SYMBOL_STATISTICS_LIST) + self._register_temp_file_needed(config.SYMBOL_STATISTICS) def run(self, stats_keeper): - symbol_stats = SymbolStatistics() + symbol_stats = SymbolStatistics( + artifact_manager.get_temp_file(config.SYMBOL_STATISTICS) + ) symbols = Ctx().symbol_strategy.get_symbols(symbol_stats) @@ -157,164 +157,1130 @@ class CollateSymbolsPass(Pass): if symbols is None or symbol_stats.check_consistency(symbols): sys.exit(1) + for symbol in symbols: + if isinstance(symbol, ExcludedSymbol): + symbol_stats.exclude_symbol(symbol) + + preferred_parents = symbol_stats.get_preferred_parents() + + for symbol in symbols: + if symbol in preferred_parents: + preferred_parent = preferred_parents[symbol] + del preferred_parents[symbol] + if preferred_parent is None: + symbol.preferred_parent_id = None + Log().debug('%s has no preferred parent' % (symbol,)) + else: + symbol.preferred_parent_id = preferred_parent.id + Log().debug( + 'The preferred parent of %s is %s' % (symbol, preferred_parent,) + ) + + if preferred_parents: + raise InternalError('Some symbols unaccounted for') + create_symbol_database(symbols) Log().quiet("Done") -class ResyncRevsPass(Pass): - """Clean up the revision information. +class FilterSymbolsPass(Pass): + """Delete any branches/tags that are to be excluded. - This pass was formerly known as pass2.""" + Also delete revisions on excluded branches, and delete other + references to the excluded symbols.""" def register_artifacts(self): - self._register_temp_file(config.CVS_REVS_RESYNC_DATAFILE) - self._register_temp_file(config.CVS_ITEMS_RESYNC_STORE) - self._register_temp_file(config.CVS_ITEMS_RESYNC_INDEX_TABLE) + self._register_temp_file(config.CVS_ITEMS_FILTERED_STORE) + self._register_temp_file(config.CVS_ITEMS_FILTERED_INDEX_TABLE) + self._register_temp_file(config.CVS_REVS_SUMMARY_DATAFILE) + self._register_temp_file(config.CVS_SYMBOLS_SUMMARY_DATAFILE) self._register_temp_file_needed(config.SYMBOL_DB) - self._register_temp_file_needed(config.RESYNC_DATAFILE) self._register_temp_file_needed(config.CVS_FILES_DB) self._register_temp_file_needed(config.CVS_ITEMS_STORE) - - def update_symbols(self, cvs_rev): - """Update CVS_REV.branch_ids and tag_ids based on self.symbol_db.""" - - branch_ids = [] - tag_ids = [] - for id in cvs_rev.branch_ids + cvs_rev.tag_ids: - symbol = self.symbol_db.get_symbol(id) - if isinstance(symbol, BranchSymbol): - branch_ids.append(symbol.id) - elif isinstance(symbol, TagSymbol): - tag_ids.append(symbol.id) - cvs_rev.branch_ids = branch_ids - cvs_rev.tag_ids = tag_ids + Ctx().revision_reader.get_revision_excluder().register_artifacts(self) def run(self, stats_keeper): Ctx()._cvs_file_db = CVSFileDatabase(DB_OPEN_READ) - self.symbol_db = SymbolDatabase() - Ctx()._symbol_db = self.symbol_db + Ctx()._symbol_db = SymbolDatabase() cvs_item_store = OldCVSItemStore( artifact_manager.get_temp_file(config.CVS_ITEMS_STORE)) - cvs_items_resync_db = NewIndexedCVSItemStore( - artifact_manager.get_temp_file(config.CVS_ITEMS_RESYNC_STORE), - artifact_manager.get_temp_file(config.CVS_ITEMS_RESYNC_INDEX_TABLE)) + cvs_items_db = IndexedCVSItemStore( + artifact_manager.get_temp_file(config.CVS_ITEMS_FILTERED_STORE), + artifact_manager.get_temp_file(config.CVS_ITEMS_FILTERED_INDEX_TABLE), + DB_OPEN_NEW) + revs_summary_file = open( + artifact_manager.get_temp_file(config.CVS_REVS_SUMMARY_DATAFILE), + 'w') + symbols_summary_file = open( + artifact_manager.get_temp_file(config.CVS_SYMBOLS_SUMMARY_DATAFILE), + 'w') + + revision_excluder = Ctx().revision_reader.get_revision_excluder() + + Log().quiet("Filtering out excluded symbols and summarizing items...") + + revision_excluder.start() + # Process the cvs items store one file at a time: + for cvs_file_items in cvs_item_store.iter_cvs_file_items(): + cvs_file_items.filter_excluded_symbols(revision_excluder) + cvs_file_items.mutate_symbols() + cvs_file_items.adjust_parents() + cvs_file_items.refine_symbols() + cvs_file_items.record_opened_symbols() + cvs_file_items.record_closed_symbols() + + if Log().is_on(Log.DEBUG): + cvs_file_items.check_symbol_parent_lods() + + # Store whatever is left to the new file: + for cvs_item in cvs_file_items.values(): + cvs_items_db.add(cvs_item) + + if isinstance(cvs_item, CVSRevision): + revs_summary_file.write( + '%x %08x %x\n' + % (cvs_item.metadata_id, cvs_item.timestamp, cvs_item.id,)) + elif isinstance(cvs_item, CVSSymbol): + symbols_summary_file.write( + '%x %x\n' % (cvs_item.symbol.id, cvs_item.id,)) + + revision_excluder.finish() + symbols_summary_file.close() + revs_summary_file.close() + cvs_items_db.close() + cvs_item_store.close() + Ctx()._symbol_db.close() + Ctx()._cvs_file_db.close() - Log().quiet("Re-synchronizing CVS revision timestamps...") + Log().quiet("Done") - resynchronizer = CVSRevisionResynchronizer(cvs_item_store) - # We may have recorded some changes in revisions' timestamp. We need to - # scan for any other files which may have had the same log message and - # occurred at "the same time" and change their timestamps, too. +class SortRevisionSummaryPass(Pass): + """Sort the revision summary file.""" - # Process the revisions file, looking for items to clean up - for cvs_item in cvs_item_store: - if isinstance(cvs_item, CVSRevision): - # Skip this entire revision if it's on an excluded branch - if isinstance(cvs_item.lod, Branch): - symbol = self.symbol_db.get_symbol(cvs_item.lod.symbol.id) - if isinstance(symbol, ExcludedSymbol): - continue + def register_artifacts(self): + self._register_temp_file(config.CVS_REVS_SUMMARY_SORTED_DATAFILE) + self._register_temp_file_needed(config.CVS_REVS_SUMMARY_DATAFILE) - self.update_symbols(cvs_item) + def run(self, stats_keeper): + Log().quiet("Sorting CVS revision summaries...") + sort_file( + artifact_manager.get_temp_file(config.CVS_REVS_SUMMARY_DATAFILE), + artifact_manager.get_temp_file( + config.CVS_REVS_SUMMARY_SORTED_DATAFILE)) + Log().quiet("Done") - resynchronizer.resynchronize(cvs_item) - cvs_items_resync_db.add(cvs_item) +class SortSymbolSummaryPass(Pass): + """Sort the symbol summary file.""" - cvs_items_resync_db.close() + def register_artifacts(self): + self._register_temp_file(config.CVS_SYMBOLS_SUMMARY_SORTED_DATAFILE) + self._register_temp_file_needed(config.CVS_SYMBOLS_SUMMARY_DATAFILE) + def run(self, stats_keeper): + Log().quiet("Sorting CVS symbol summaries...") + sort_file( + artifact_manager.get_temp_file(config.CVS_SYMBOLS_SUMMARY_DATAFILE), + artifact_manager.get_temp_file( + config.CVS_SYMBOLS_SUMMARY_SORTED_DATAFILE)) Log().quiet("Done") -class SortRevsPass(Pass): - """This pass was formerly known as pass3.""" +class InitializeChangesetsPass(Pass): + """Create preliminary CommitSets.""" def register_artifacts(self): - self._register_temp_file(config.CVS_REVS_SORTED_DATAFILE) - self._register_temp_file_needed(config.CVS_REVS_RESYNC_DATAFILE) + self._register_temp_file(config.CVS_ITEM_TO_CHANGESET) + self._register_temp_file(config.CHANGESETS_STORE) + self._register_temp_file(config.CHANGESETS_INDEX) + self._register_temp_file(config.CVS_ITEMS_SORTED_STORE) + self._register_temp_file(config.CVS_ITEMS_SORTED_INDEX_TABLE) + self._register_temp_file_needed(config.SYMBOL_DB) + self._register_temp_file_needed(config.CVS_FILES_DB) + self._register_temp_file_needed(config.CVS_ITEMS_FILTERED_STORE) + self._register_temp_file_needed(config.CVS_ITEMS_FILTERED_INDEX_TABLE) + self._register_temp_file_needed(config.CVS_REVS_SUMMARY_SORTED_DATAFILE) + self._register_temp_file_needed( + config.CVS_SYMBOLS_SUMMARY_SORTED_DATAFILE) + + def get_revision_changesets(self): + """Generate revision changesets, one at a time.""" + + # Create changesets for CVSRevisions: + old_metadata_id = None + old_timestamp = None + changeset = [] + for l in open( + artifact_manager.get_temp_file( + config.CVS_REVS_SUMMARY_SORTED_DATAFILE), 'r'): + [metadata_id, timestamp, cvs_item_id] = \ + [int(s, 16) for s in l.strip().split()] + if metadata_id != old_metadata_id \ + or timestamp > old_timestamp + config.COMMIT_THRESHOLD: + # Start a new changeset. First finish up the old changeset, + # if any: + if changeset: + yield RevisionChangeset( + self.changeset_key_generator.gen_id(), changeset) + changeset = [] + old_metadata_id = metadata_id + changeset.append(cvs_item_id) + old_timestamp = timestamp + + # Finish up the last changeset, if any: + if changeset: + yield RevisionChangeset( + self.changeset_key_generator.gen_id(), changeset) + + def get_symbol_changesets(self): + """Generate symbol changesets, one at a time.""" + + old_symbol_id = None + changeset = [] + for l in open( + artifact_manager.get_temp_file( + config.CVS_SYMBOLS_SUMMARY_SORTED_DATAFILE), 'r'): + [symbol_id, cvs_item_id] = [int(s, 16) for s in l.strip().split()] + if symbol_id != old_symbol_id: + # Start a new changeset. First finish up the old changeset, + # if any: + if changeset: + yield create_symbol_changeset( + self.changeset_key_generator.gen_id(), + Ctx()._symbol_db.get_symbol(old_symbol_id), changeset) + changeset = [] + old_symbol_id = symbol_id + changeset.append(cvs_item_id) + + # Finish up the last changeset, if any: + if changeset: + yield create_symbol_changeset( + self.changeset_key_generator.gen_id(), + Ctx()._symbol_db.get_symbol(symbol_id), changeset) + + def compare_items(a, b): + return ( + cmp(a.timestamp, b.timestamp) + or cmp(a.cvs_file.cvs_path, b.cvs_file.cvs_path) + or cmp([int(x) for x in a.rev.split('.')], + [int(x) for x in b.rev.split('.')]) + or cmp(a.id, b.id)) + + compare_items = staticmethod(compare_items) + + def break_internal_dependencies(self, changeset): + """Split up CHANGESET if necessary to break internal dependencies. + + Return a list containing the resulting changeset(s). Iff + CHANGESET did not have to be split, then the return value will + contain a single value, namely the original CHANGESET. Split + CHANGESET at most once, even though the resulting changesets might + themselves have internal dependencies.""" + + cvs_items = changeset.get_cvs_items() + # We only look for succ dependencies, since by doing so we + # automatically cover pred dependencies as well. First create a + # list of tuples (pred, succ) of id pairs for CVSItems that depend + # on each other. + dependencies = [] + for cvs_item in cvs_items: + for next_id in cvs_item.get_succ_ids(): + if next_id in changeset.cvs_item_ids: + # Sanity check: a CVSItem should never depend on itself: + if next_id == cvs_item.id: + raise InternalError('Item depends on itself: %s' % (cvs_item,)) + + dependencies.append((cvs_item.id, next_id,)) + + if dependencies: + # Sort the cvs_items in a defined order (chronological to the + # extent that the timestamps are correct and unique). + cvs_items = list(cvs_items) + cvs_items.sort(self.compare_items) + indexes = {} + for i in range(len(cvs_items)): + indexes[cvs_items[i].id] = i + # How many internal dependencies would be broken by breaking the + # Changeset after a particular index? + breaks = [0] * len(cvs_items) + for (pred, succ,) in dependencies: + pred_index = indexes[pred] + succ_index = indexes[succ] + breaks[min(pred_index, succ_index)] += 1 + breaks[max(pred_index, succ_index)] -= 1 + best_i = None + best_count = -1 + best_time = 0 + for i in range(1, len(breaks)): + breaks[i] += breaks[i - 1] + for i in range(0, len(breaks) - 1): + if breaks[i] > best_count: + best_i = i + best_count = breaks[i] + best_time = cvs_items[i + 1].timestamp - cvs_items[i].timestamp + elif breaks[i] == best_count \ + and cvs_items[i + 1].timestamp - cvs_items[i].timestamp \ + < best_time: + best_i = i + best_count = breaks[i] + best_time = cvs_items[i + 1].timestamp - cvs_items[i].timestamp + # Reuse the old changeset.id for the first of the split changesets. + return [ + RevisionChangeset( + changeset.id, + [cvs_item.id for cvs_item in cvs_items[:best_i + 1]]), + RevisionChangeset( + self.changeset_key_generator.gen_id(), + [cvs_item.id for cvs_item in cvs_items[best_i + 1:]]), + ] + else: + return [changeset] + + def break_all_internal_dependencies(self, changeset): + """Keep breaking CHANGESET up until all internal dependencies are broken. + + Generate the changeset fragments. This method is written + non-recursively to avoid any possible problems with recursion + depth.""" + + changesets_to_split = [changeset] + while changesets_to_split: + changesets = self.break_internal_dependencies(changesets_to_split.pop()) + if len(changesets) == 1: + yield changesets[0] + else: + # The changeset had to be split; see if either of the + # fragments have to be split: + changesets.reverse() + changesets_to_split.extend(changesets) + + def get_changesets(self): + """Return all changesets, with internal dependencies already broken.""" + + for changeset in self.get_revision_changesets(): + for split_changeset in self.break_all_internal_dependencies(changeset): + yield split_changeset + + for changeset in self.get_symbol_changesets(): + yield changeset def run(self, stats_keeper): - Log().quiet("Sorting CVS revisions...") - sort_file(artifact_manager.get_temp_file(config.CVS_REVS_RESYNC_DATAFILE), - artifact_manager.get_temp_file(config.CVS_REVS_SORTED_DATAFILE)) + Log().quiet("Creating preliminary commit sets...") + + Ctx()._cvs_file_db = CVSFileDatabase(DB_OPEN_READ) + Ctx()._symbol_db = SymbolDatabase() + Ctx()._cvs_items_db = IndexedCVSItemStore( + artifact_manager.get_temp_file(config.CVS_ITEMS_FILTERED_STORE), + artifact_manager.get_temp_file(config.CVS_ITEMS_FILTERED_INDEX_TABLE), + DB_OPEN_READ) + + changeset_graph = ChangesetGraph( + ChangesetDatabase( + artifact_manager.get_temp_file(config.CHANGESETS_STORE), + artifact_manager.get_temp_file(config.CHANGESETS_INDEX), + DB_OPEN_NEW, + ), + CVSItemToChangesetTable( + artifact_manager.get_temp_file(config.CVS_ITEM_TO_CHANGESET), + DB_OPEN_NEW, + ), + ) + + self.sorted_cvs_items_db = IndexedCVSItemStore( + artifact_manager.get_temp_file(config.CVS_ITEMS_SORTED_STORE), + artifact_manager.get_temp_file(config.CVS_ITEMS_SORTED_INDEX_TABLE), + DB_OPEN_NEW) + + self.changeset_key_generator = KeyGenerator(1) + + for changeset in self.get_changesets(): + if Log().is_on(Log.DEBUG): + Log().debug(repr(changeset)) + changeset_graph.store_changeset(changeset) + for cvs_item in changeset.get_cvs_items(): + self.sorted_cvs_items_db.add(cvs_item) + + self.sorted_cvs_items_db.close() + changeset_graph.close() + Ctx()._cvs_items_db.close() + Ctx()._symbol_db.close() + Ctx()._cvs_file_db.close() + Log().quiet("Done") -class CreateDatabasesPass(Pass): - """This pass was formerly known as pass4.""" +class ProcessedChangesetLogger: + def __init__(self): + self.processed_changeset_ids = [] + + def log(self, changeset_id): + if Log().is_on(Log.DEBUG): + self.processed_changeset_ids.append(changeset_id) + + def flush(self): + if self.processed_changeset_ids: + Log().debug( + 'Consumed changeset ids %s' + % (', '.join(['%x' % id for id in self.processed_changeset_ids]),)) + + del self.processed_changeset_ids[:] + + +class BreakRevisionChangesetCyclesPass(Pass): + """Break up any dependency cycles involving only RevisionChangesets.""" def register_artifacts(self): - if not Ctx().trunk_only: - self._register_temp_file(config.SYMBOL_LAST_CVS_REVS_DB) + self._register_temp_file(config.CHANGESETS_REVBROKEN_STORE) + self._register_temp_file(config.CHANGESETS_REVBROKEN_INDEX) + self._register_temp_file(config.CVS_ITEM_TO_CHANGESET_REVBROKEN) + self._register_temp_file_needed(config.SYMBOL_DB) self._register_temp_file_needed(config.CVS_FILES_DB) + self._register_temp_file_needed(config.CVS_ITEMS_SORTED_STORE) + self._register_temp_file_needed(config.CVS_ITEMS_SORTED_INDEX_TABLE) + self._register_temp_file_needed(config.CHANGESETS_STORE) + self._register_temp_file_needed(config.CHANGESETS_INDEX) + self._register_temp_file_needed(config.CVS_ITEM_TO_CHANGESET) + + def get_source_changesets(self): + old_changeset_db = ChangesetDatabase( + artifact_manager.get_temp_file(config.CHANGESETS_STORE), + artifact_manager.get_temp_file(config.CHANGESETS_INDEX), + DB_OPEN_READ) + + changeset_ids = old_changeset_db.keys() + + for changeset_id in changeset_ids: + yield old_changeset_db[changeset_id] + + old_changeset_db.close() + del old_changeset_db + + def break_cycle(self, cycle): + """Break up one or more changesets in CYCLE to help break the cycle. + + CYCLE is a list of Changesets where + + cycle[i] depends on cycle[i - 1] + + Break up one or more changesets in CYCLE to make progress towards + breaking the cycle. Update self.changeset_graph accordingly. + + It is not guaranteed that the cycle will be broken by one call to + this routine, but at least some progress must be made.""" + + self.processed_changeset_logger.flush() + best_i = None + best_link = None + for i in range(len(cycle)): + # It's OK if this index wraps to -1: + link = ChangesetGraphLink( + cycle[i - 1], cycle[i], cycle[i + 1 - len(cycle)]) + + if best_i is None or link < best_link: + best_i = i + best_link = link + + if Log().is_on(Log.DEBUG): + Log().debug( + 'Breaking cycle %s by breaking node %x' % ( + ' -> '.join(['%x' % node.id for node in (cycle + [cycle[0]])]), + best_link.changeset.id,)) + + new_changesets = best_link.break_changeset(self.changeset_key_generator) + + self.changeset_graph.delete_changeset(best_link.changeset) + + for changeset in new_changesets: + self.changeset_graph.add_new_changeset(changeset) + + def run(self, stats_keeper): + Log().quiet("Breaking revision changeset dependency cycles...") + + Ctx()._cvs_file_db = CVSFileDatabase(DB_OPEN_READ) + Ctx()._symbol_db = SymbolDatabase() + Ctx()._cvs_items_db = IndexedCVSItemStore( + artifact_manager.get_temp_file(config.CVS_ITEMS_SORTED_STORE), + artifact_manager.get_temp_file(config.CVS_ITEMS_SORTED_INDEX_TABLE), + DB_OPEN_READ) + + shutil.copyfile( + artifact_manager.get_temp_file( + config.CVS_ITEM_TO_CHANGESET), + artifact_manager.get_temp_file( + config.CVS_ITEM_TO_CHANGESET_REVBROKEN)) + cvs_item_to_changeset_id = CVSItemToChangesetTable( + artifact_manager.get_temp_file( + config.CVS_ITEM_TO_CHANGESET_REVBROKEN), + DB_OPEN_WRITE) + + changeset_db = ChangesetDatabase( + artifact_manager.get_temp_file(config.CHANGESETS_REVBROKEN_STORE), + artifact_manager.get_temp_file(config.CHANGESETS_REVBROKEN_INDEX), + DB_OPEN_NEW) + + self.changeset_graph = ChangesetGraph( + changeset_db, cvs_item_to_changeset_id + ) + + max_changeset_id = 0 + for changeset in self.get_source_changesets(): + changeset_db.store(changeset) + if isinstance(changeset, RevisionChangeset): + self.changeset_graph.add_changeset(changeset) + max_changeset_id = max(max_changeset_id, changeset.id) + + self.changeset_key_generator = KeyGenerator(max_changeset_id + 1) + + self.processed_changeset_logger = ProcessedChangesetLogger() + + # Consume the graph, breaking cycles using self.break_cycle(): + for (changeset_id, time_range) in self.changeset_graph.consume_graph( + cycle_breaker=self.break_cycle): + self.processed_changeset_logger.log(changeset_id) + + self.processed_changeset_logger.flush() + del self.processed_changeset_logger + + self.changeset_graph.close() + self.changeset_graph = None + Ctx()._cvs_items_db.close() + Ctx()._symbol_db.close() + Ctx()._cvs_file_db.close() + + Log().quiet("Done") + + +class RevisionTopologicalSortPass(Pass): + """Sort RevisionChangesets into commit order. + + Also convert them to OrderedChangesets, without changing their ids.""" + + def register_artifacts(self): + self._register_temp_file(config.CHANGESETS_REVSORTED_STORE) + self._register_temp_file(config.CHANGESETS_REVSORTED_INDEX) self._register_temp_file_needed(config.SYMBOL_DB) - self._register_temp_file_needed(config.CVS_ITEMS_RESYNC_STORE) - self._register_temp_file_needed(config.CVS_ITEMS_RESYNC_INDEX_TABLE) - self._register_temp_file_needed(config.CVS_REVS_SORTED_DATAFILE) - - def get_cvs_revs(self): - """Generator the CVSRevisions in CVS_REVS_SORTED_DATAFILE order.""" - - cvs_items_db = OldIndexedCVSItemStore( - artifact_manager.get_temp_file(config.CVS_ITEMS_RESYNC_STORE), - artifact_manager.get_temp_file(config.CVS_ITEMS_RESYNC_INDEX_TABLE)) - for line in file( - artifact_manager.get_temp_file(config.CVS_REVS_SORTED_DATAFILE)): - cvs_rev_id = int(line.strip().split()[-1], 16) - yield cvs_items_db[cvs_rev_id] - cvs_items_db.close() + self._register_temp_file_needed(config.CVS_FILES_DB) + self._register_temp_file_needed(config.CVS_ITEMS_SORTED_STORE) + self._register_temp_file_needed(config.CVS_ITEMS_SORTED_INDEX_TABLE) + self._register_temp_file_needed(config.CHANGESETS_REVBROKEN_STORE) + self._register_temp_file_needed(config.CHANGESETS_REVBROKEN_INDEX) + self._register_temp_file_needed(config.CVS_ITEM_TO_CHANGESET_REVBROKEN) + + def get_source_changesets(self, changeset_db): + changeset_ids = changeset_db.keys() + + for changeset_id in changeset_ids: + yield changeset_db[changeset_id] + + def get_changesets(self): + changeset_db = ChangesetDatabase( + artifact_manager.get_temp_file(config.CHANGESETS_REVBROKEN_STORE), + artifact_manager.get_temp_file(config.CHANGESETS_REVBROKEN_INDEX), + DB_OPEN_READ, + ) + + changeset_graph = ChangesetGraph( + changeset_db, + CVSItemToChangesetTable( + artifact_manager.get_temp_file( + config.CVS_ITEM_TO_CHANGESET_REVBROKEN + ), + DB_OPEN_READ, + ) + ) + + for changeset in self.get_source_changesets(changeset_db): + if isinstance(changeset, RevisionChangeset): + changeset_graph.add_changeset(changeset) + else: + yield changeset + + changeset_ids = [] + + # Sentry: + changeset_ids.append(None) + + for (changeset_id, time_range) in changeset_graph.consume_graph(): + changeset_ids.append(changeset_id) + + # Sentry: + changeset_ids.append(None) + + for i in range(1, len(changeset_ids) - 1): + changeset = changeset_db[changeset_ids[i]] + yield OrderedChangeset( + changeset.id, changeset.cvs_item_ids, i - 1, + changeset_ids[i - 1], changeset_ids[i + 1]) + + changeset_graph.close() + + def run(self, stats_keeper): + Log().quiet("Generating CVSRevisions in commit order...") + + Ctx()._cvs_file_db = CVSFileDatabase(DB_OPEN_READ) + Ctx()._symbol_db = SymbolDatabase() + Ctx()._cvs_items_db = IndexedCVSItemStore( + artifact_manager.get_temp_file(config.CVS_ITEMS_SORTED_STORE), + artifact_manager.get_temp_file(config.CVS_ITEMS_SORTED_INDEX_TABLE), + DB_OPEN_READ) + + changesets_revordered_db = ChangesetDatabase( + artifact_manager.get_temp_file(config.CHANGESETS_REVSORTED_STORE), + artifact_manager.get_temp_file(config.CHANGESETS_REVSORTED_INDEX), + DB_OPEN_NEW) + + for changeset in self.get_changesets(): + changesets_revordered_db.store(changeset) + + changesets_revordered_db.close() + Ctx()._cvs_items_db.close() + Ctx()._symbol_db.close() + Ctx()._cvs_file_db.close() + + Log().quiet("Done") + + +class BreakSymbolChangesetCyclesPass(Pass): + """Break up any dependency cycles involving only SymbolChangesets.""" + + def register_artifacts(self): + self._register_temp_file(config.CHANGESETS_SYMBROKEN_STORE) + self._register_temp_file(config.CHANGESETS_SYMBROKEN_INDEX) + self._register_temp_file(config.CVS_ITEM_TO_CHANGESET_SYMBROKEN) + self._register_temp_file_needed(config.SYMBOL_DB) + self._register_temp_file_needed(config.CVS_FILES_DB) + self._register_temp_file_needed(config.CVS_ITEMS_SORTED_STORE) + self._register_temp_file_needed(config.CVS_ITEMS_SORTED_INDEX_TABLE) + self._register_temp_file_needed(config.CHANGESETS_REVSORTED_STORE) + self._register_temp_file_needed(config.CHANGESETS_REVSORTED_INDEX) + self._register_temp_file_needed(config.CVS_ITEM_TO_CHANGESET_REVBROKEN) + + def get_source_changesets(self): + old_changeset_db = ChangesetDatabase( + artifact_manager.get_temp_file(config.CHANGESETS_REVSORTED_STORE), + artifact_manager.get_temp_file(config.CHANGESETS_REVSORTED_INDEX), + DB_OPEN_READ) + + changeset_ids = old_changeset_db.keys() + + for changeset_id in changeset_ids: + yield old_changeset_db[changeset_id] + + old_changeset_db.close() + + def break_cycle(self, cycle): + """Break up one or more changesets in CYCLE to help break the cycle. + + CYCLE is a list of Changesets where + + cycle[i] depends on cycle[i - 1] + + Break up one or more changesets in CYCLE to make progress towards + breaking the cycle. Update self.changeset_graph accordingly. + + It is not guaranteed that the cycle will be broken by one call to + this routine, but at least some progress must be made.""" + + self.processed_changeset_logger.flush() + best_i = None + best_link = None + for i in range(len(cycle)): + # It's OK if this index wraps to -1: + link = ChangesetGraphLink( + cycle[i - 1], cycle[i], cycle[i + 1 - len(cycle)]) + + if best_i is None or link < best_link: + best_i = i + best_link = link + + if Log().is_on(Log.DEBUG): + Log().debug( + 'Breaking cycle %s by breaking node %x' % ( + ' -> '.join(['%x' % node.id for node in (cycle + [cycle[0]])]), + best_link.changeset.id,)) + + new_changesets = best_link.break_changeset(self.changeset_key_generator) + + self.changeset_graph.delete_changeset(best_link.changeset) + + for changeset in new_changesets: + self.changeset_graph.add_new_changeset(changeset) def run(self, stats_keeper): - """If we're not doing a trunk-only conversion, generate the - LastSymbolicNameDatabase, which contains the last CVSRevision that - is a source for each tag or branch. Also record the remaining - revisions to the StatsKeeper.""" + Log().quiet("Breaking symbol changeset dependency cycles...") Ctx()._cvs_file_db = CVSFileDatabase(DB_OPEN_READ) Ctx()._symbol_db = SymbolDatabase() + Ctx()._cvs_items_db = IndexedCVSItemStore( + artifact_manager.get_temp_file(config.CVS_ITEMS_SORTED_STORE), + artifact_manager.get_temp_file(config.CVS_ITEMS_SORTED_INDEX_TABLE), + DB_OPEN_READ) + + shutil.copyfile( + artifact_manager.get_temp_file( + config.CVS_ITEM_TO_CHANGESET_REVBROKEN), + artifact_manager.get_temp_file( + config.CVS_ITEM_TO_CHANGESET_SYMBROKEN)) + cvs_item_to_changeset_id = CVSItemToChangesetTable( + artifact_manager.get_temp_file( + config.CVS_ITEM_TO_CHANGESET_SYMBROKEN), + DB_OPEN_WRITE) + + changeset_db = ChangesetDatabase( + artifact_manager.get_temp_file(config.CHANGESETS_SYMBROKEN_STORE), + artifact_manager.get_temp_file(config.CHANGESETS_SYMBROKEN_INDEX), + DB_OPEN_NEW) + + self.changeset_graph = ChangesetGraph( + changeset_db, cvs_item_to_changeset_id + ) + + max_changeset_id = 0 + for changeset in self.get_source_changesets(): + changeset_db.store(changeset) + if isinstance(changeset, SymbolChangeset): + self.changeset_graph.add_changeset(changeset) + max_changeset_id = max(max_changeset_id, changeset.id) + + self.changeset_key_generator = KeyGenerator(max_changeset_id + 1) + + self.processed_changeset_logger = ProcessedChangesetLogger() + + # Consume the graph, breaking cycles using self.break_cycle(): + for (changeset_id, time_range) in self.changeset_graph.consume_graph( + cycle_breaker=self.break_cycle): + self.processed_changeset_logger.log(changeset_id) + + self.processed_changeset_logger.flush() + del self.processed_changeset_logger - if Ctx().trunk_only: - for cvs_rev in self.get_cvs_revs(): - stats_keeper.record_cvs_rev(cvs_rev) + self.changeset_graph.close() + self.changeset_graph = None + Ctx()._cvs_items_db.close() + Ctx()._symbol_db.close() + Ctx()._cvs_file_db.close() + + Log().quiet("Done") + + +class BreakAllChangesetCyclesPass(Pass): + """Break up any dependency cycles that are closed by SymbolChangesets.""" + + def register_artifacts(self): + self._register_temp_file(config.CHANGESETS_ALLBROKEN_STORE) + self._register_temp_file(config.CHANGESETS_ALLBROKEN_INDEX) + self._register_temp_file(config.CVS_ITEM_TO_CHANGESET_ALLBROKEN) + self._register_temp_file_needed(config.SYMBOL_DB) + self._register_temp_file_needed(config.CVS_FILES_DB) + self._register_temp_file_needed(config.CVS_ITEMS_SORTED_STORE) + self._register_temp_file_needed(config.CVS_ITEMS_SORTED_INDEX_TABLE) + self._register_temp_file_needed(config.CHANGESETS_SYMBROKEN_STORE) + self._register_temp_file_needed(config.CHANGESETS_SYMBROKEN_INDEX) + self._register_temp_file_needed(config.CVS_ITEM_TO_CHANGESET_SYMBROKEN) + + def get_source_changesets(self): + old_changeset_db = ChangesetDatabase( + artifact_manager.get_temp_file(config.CHANGESETS_SYMBROKEN_STORE), + artifact_manager.get_temp_file(config.CHANGESETS_SYMBROKEN_INDEX), + DB_OPEN_READ) + + changeset_ids = old_changeset_db.keys() + + for changeset_id in changeset_ids: + yield old_changeset_db[changeset_id] + + old_changeset_db.close() + + def _split_retrograde_changeset(self, changeset): + """CHANGESET is retrograde. Split it into non-retrograde changesets.""" + + Log().debug('Breaking retrograde changeset %x' % (changeset.id,)) + + self.changeset_graph.delete_changeset(changeset) + + # A map { cvs_branch_id : (max_pred_ordinal, min_succ_ordinal) } + ordinal_limits = {} + for cvs_branch in changeset.get_cvs_items(): + max_pred_ordinal = 0 + min_succ_ordinal = sys.maxint + + for pred_id in cvs_branch.get_pred_ids(): + pred_ordinal = self.ordinals.get( + self.cvs_item_to_changeset_id[pred_id], 0) + max_pred_ordinal = max(max_pred_ordinal, pred_ordinal) + + for succ_id in cvs_branch.get_succ_ids(): + succ_ordinal = self.ordinals.get( + self.cvs_item_to_changeset_id[succ_id], sys.maxint) + min_succ_ordinal = min(min_succ_ordinal, succ_ordinal) + + assert max_pred_ordinal < min_succ_ordinal + ordinal_limits[cvs_branch.id] = (max_pred_ordinal, min_succ_ordinal,) + + # Find the earliest successor ordinal: + min_min_succ_ordinal = sys.maxint + for (max_pred_ordinal, min_succ_ordinal) in ordinal_limits.values(): + min_min_succ_ordinal = min(min_min_succ_ordinal, min_succ_ordinal) + + early_item_ids = [] + late_item_ids = [] + for (id, (max_pred_ordinal, min_succ_ordinal)) in ordinal_limits.items(): + if max_pred_ordinal >= min_min_succ_ordinal: + late_item_ids.append(id) else: - Log().quiet("Finding last CVS revisions for all symbolic names...") - last_sym_name_db = LastSymbolicNameDatabase() + early_item_ids.append(id) + + assert early_item_ids + assert late_item_ids - for cvs_rev in self.get_cvs_revs(): - last_sym_name_db.log_revision(cvs_rev) - stats_keeper.record_cvs_rev(cvs_rev) + early_changeset = changeset.create_split_changeset( + self.changeset_key_generator.gen_id(), early_item_ids) + late_changeset = changeset.create_split_changeset( + self.changeset_key_generator.gen_id(), late_item_ids) + + self.changeset_graph.add_new_changeset(early_changeset) + self.changeset_graph.add_new_changeset(late_changeset) + + early_split = self._split_if_retrograde(early_changeset.id) + + # Because of the way we constructed it, the early changeset should + # not have to be split: + assert not early_split + + self._split_if_retrograde(late_changeset.id) + + def _split_if_retrograde(self, changeset_id): + node = self.changeset_graph[changeset_id] + pred_ordinals = [ + self.ordinals[id] + for id in node.pred_ids + if id in self.ordinals + ] + pred_ordinals.sort() + succ_ordinals = [ + self.ordinals[id] + for id in node.succ_ids + if id in self.ordinals + ] + succ_ordinals.sort() + if pred_ordinals and succ_ordinals \ + and pred_ordinals[-1] >= succ_ordinals[0]: + self._split_retrograde_changeset(self.changeset_db[node.id]) + return True + else: + return False - last_sym_name_db.create_database() + def break_segment(self, segment): + """Break a changeset in SEGMENT[1:-1]. - stats_keeper.set_stats_reflect_exclude(True) + The range SEGMENT[1:-1] is not empty, and all of the changesets in + that range are SymbolChangesets.""" + + best_i = None + best_link = None + for i in range(1, len(segment) - 1): + link = ChangesetGraphLink(segment[i - 1], segment[i], segment[i + 1]) + + if best_i is None or link < best_link: + best_i = i + best_link = link + + if Log().is_on(Log.DEBUG): + Log().debug( + 'Breaking segment %s by breaking node %x' % ( + ' -> '.join(['%x' % node.id for node in segment]), + best_link.changeset.id,)) + + new_changesets = best_link.break_changeset(self.changeset_key_generator) + self.changeset_graph.delete_changeset(best_link.changeset) + + for changeset in new_changesets: + self.changeset_graph.add_new_changeset(changeset) + + def break_cycle(self, cycle): + """Break up one or more SymbolChangesets in CYCLE to help break the cycle. + + CYCLE is a list of SymbolChangesets where + + cycle[i] depends on cycle[i - 1] + + . Break up one or more changesets in CYCLE to make progress + towards breaking the cycle. Update self.changeset_graph + accordingly. + + It is not guaranteed that the cycle will be broken by one call to + this routine, but at least some progress must be made.""" + + if Log().is_on(Log.DEBUG): + Log().debug( + 'Breaking cycle %s' % ( + ' -> '.join(['%x' % changeset.id + for changeset in cycle + [cycle[0]]]),)) + + # Unwrap the cycle into a segment then break the segment: + self.break_segment([cycle[-1]] + cycle + [cycle[0]]) + + def run(self, stats_keeper): + Log().quiet("Breaking CVSSymbol dependency loops...") + + Ctx()._cvs_file_db = CVSFileDatabase(DB_OPEN_READ) + Ctx()._symbol_db = SymbolDatabase() + Ctx()._cvs_items_db = IndexedCVSItemStore( + artifact_manager.get_temp_file(config.CVS_ITEMS_SORTED_STORE), + artifact_manager.get_temp_file(config.CVS_ITEMS_SORTED_INDEX_TABLE), + DB_OPEN_READ) + + shutil.copyfile( + artifact_manager.get_temp_file( + config.CVS_ITEM_TO_CHANGESET_SYMBROKEN), + artifact_manager.get_temp_file( + config.CVS_ITEM_TO_CHANGESET_ALLBROKEN)) + self.cvs_item_to_changeset_id = CVSItemToChangesetTable( + artifact_manager.get_temp_file( + config.CVS_ITEM_TO_CHANGESET_ALLBROKEN), + DB_OPEN_WRITE) + + self.changeset_db = ChangesetDatabase( + artifact_manager.get_temp_file(config.CHANGESETS_ALLBROKEN_STORE), + artifact_manager.get_temp_file(config.CHANGESETS_ALLBROKEN_INDEX), + DB_OPEN_NEW) + + self.changeset_graph = ChangesetGraph( + self.changeset_db, self.cvs_item_to_changeset_id + ) + + # A map {changeset_id : ordinal} for OrderedChangesets: + self.ordinals = {} + # A map {ordinal : changeset_id}: + ordered_changeset_map = {} + # A list of all BranchChangeset ids: + branch_changeset_ids = [] + max_changeset_id = 0 + for changeset in self.get_source_changesets(): + self.changeset_db.store(changeset) + self.changeset_graph.add_changeset(changeset) + if isinstance(changeset, OrderedChangeset): + ordered_changeset_map[changeset.ordinal] = changeset.id + self.ordinals[changeset.id] = changeset.ordinal + elif isinstance(changeset, BranchChangeset): + branch_changeset_ids.append(changeset.id) + max_changeset_id = max(max_changeset_id, changeset.id) + + # An array of ordered_changeset ids, indexed by ordinal: + ordered_changesets = [] + for ordinal in range(len(ordered_changeset_map)): + id = ordered_changeset_map[ordinal] + ordered_changesets.append(id) + + ordered_changeset_ids = set(ordered_changeset_map.values()) + del ordered_changeset_map + + self.changeset_key_generator = KeyGenerator(max_changeset_id + 1) + + # First we scan through all BranchChangesets looking for + # changesets that are individually "retrograde" and splitting + # those up: + for changeset_id in branch_changeset_ids: + self._split_if_retrograde(changeset_id) + + del self.ordinals + + next_ordered_changeset = 0 + + self.processed_changeset_logger = ProcessedChangesetLogger() + + while self.changeset_graph: + # Consume any nodes that don't have predecessors: + for (changeset_id, time_range) \ + in self.changeset_graph.consume_nopred_nodes(): + self.processed_changeset_logger.log(changeset_id) + if changeset_id in ordered_changeset_ids: + next_ordered_changeset += 1 + ordered_changeset_ids.remove(changeset_id) + + self.processed_changeset_logger.flush() + + if not self.changeset_graph: + break + + # Now work on the next ordered changeset that has not yet been + # processed. BreakSymbolChangesetCyclesPass has broken any + # cycles involving only SymbolChangesets, so the presence of a + # cycle implies that there is at least one ordered changeset + # left in the graph: + assert next_ordered_changeset < len(ordered_changesets) + + id = ordered_changesets[next_ordered_changeset] + path = self.changeset_graph.search_for_path(id, ordered_changeset_ids) + if path: + if Log().is_on(Log.DEBUG): + Log().debug('Breaking path from %s to %s' % (path[0], path[-1],)) + self.break_segment(path) + else: + # There were no ordered changesets among the reachable + # predecessors, so do generic cycle-breaking: + if Log().is_on(Log.DEBUG): + Log().debug( + 'Breaking generic cycle found from %s' + % (self.changeset_db[id],) + ) + self.break_cycle(self.changeset_graph.find_cycle(id)) + + del self.processed_changeset_logger + self.changeset_graph.close() + self.changeset_graph = None + self.cvs_item_to_changeset_id = None + self.changeset_db = None + + Log().quiet("Done") + + +class TopologicalSortPass(Pass): + """Sort changesets into commit order.""" + + def register_artifacts(self): + self._register_temp_file(config.CHANGESETS_SORTED_DATAFILE) + self._register_temp_file_needed(config.SYMBOL_DB) + self._register_temp_file_needed(config.CVS_FILES_DB) + self._register_temp_file_needed(config.CVS_ITEMS_SORTED_STORE) + self._register_temp_file_needed(config.CVS_ITEMS_SORTED_INDEX_TABLE) + self._register_temp_file_needed(config.CHANGESETS_ALLBROKEN_STORE) + self._register_temp_file_needed(config.CHANGESETS_ALLBROKEN_INDEX) + self._register_temp_file_needed(config.CVS_ITEM_TO_CHANGESET_ALLBROKEN) + + def get_source_changesets(self, changeset_db): + for changeset_id in changeset_db.keys(): + yield changeset_db[changeset_id] + + def get_changesets(self): + """Generate (changeset, timestamp) pairs in commit order.""" + + changeset_db = ChangesetDatabase( + artifact_manager.get_temp_file(config.CHANGESETS_ALLBROKEN_STORE), + artifact_manager.get_temp_file(config.CHANGESETS_ALLBROKEN_INDEX), + DB_OPEN_READ) + + changeset_graph = ChangesetGraph( + changeset_db, + CVSItemToChangesetTable( + artifact_manager.get_temp_file( + config.CVS_ITEM_TO_CHANGESET_ALLBROKEN + ), + DB_OPEN_READ, + ), + ) + symbol_changeset_ids = set() + + for changeset in self.get_source_changesets(changeset_db): + changeset_graph.add_changeset(changeset) + if isinstance(changeset, SymbolChangeset): + symbol_changeset_ids.add(changeset.id) + + # Ensure a monotonically-increasing timestamp series by keeping + # track of the previous timestamp and ensuring that the following + # one is larger. + timestamper = Timestamper() + + for (changeset_id, time_range) in changeset_graph.consume_graph(): + changeset = changeset_db[changeset_id] + timestamp = timestamper.get( + time_range.t_max, changeset.id in symbol_changeset_ids + ) + yield (changeset, timestamp) + + changeset_graph.close() + + def run(self, stats_keeper): + Log().quiet("Generating CVSRevisions in commit order...") + + Ctx()._cvs_file_db = CVSFileDatabase(DB_OPEN_READ) + Ctx()._symbol_db = SymbolDatabase() + Ctx()._cvs_items_db = IndexedCVSItemStore( + artifact_manager.get_temp_file(config.CVS_ITEMS_SORTED_STORE), + artifact_manager.get_temp_file(config.CVS_ITEMS_SORTED_INDEX_TABLE), + DB_OPEN_READ) + + sorted_changesets = open( + artifact_manager.get_temp_file(config.CHANGESETS_SORTED_DATAFILE), + 'w') + + for (changeset, timestamp) in self.get_changesets(): + sorted_changesets.write('%x %08x\n' % (changeset.id, timestamp,)) + + for cvs_item in changeset.get_cvs_items(): + stats_keeper.record_cvs_item(cvs_item) + + sorted_changesets.close() + + stats_keeper.set_stats_reflect_exclude(True) stats_keeper.archive() + Ctx()._cvs_items_db.close() + Ctx()._symbol_db.close() + Ctx()._cvs_file_db.close() + Log().quiet("Done") -class AggregateRevsPass(Pass): +class CreateRevsPass(Pass): """Generate the SVNCommit <-> CVSRevision mapping databases. - CVSCommit._commit also calls SymbolingsLogger to register + + SVNCommitCreator also calls SymbolingsLogger to register CVSRevisions that represent an opening or closing for a path on a branch or tag. See SymbolingsLogger for more details. This pass was formerly known as pass5.""" def register_artifacts(self): - self._register_temp_file(config.SVN_COMMITS_DB) + self._register_temp_file(config.SVN_COMMITS_INDEX_TABLE) + self._register_temp_file(config.SVN_COMMITS_STORE) self._register_temp_file(config.CVS_REVS_TO_SVN_REVNUMS) - if not Ctx().trunk_only: self._register_temp_file(config.SYMBOL_OPENINGS_CLOSINGS) - self._register_temp_file_needed(config.SYMBOL_LAST_CVS_REVS_DB) self._register_temp_file_needed(config.CVS_FILES_DB) - self._register_temp_file_needed(config.CVS_ITEMS_RESYNC_STORE) - self._register_temp_file_needed(config.CVS_ITEMS_RESYNC_INDEX_TABLE) + self._register_temp_file_needed(config.CVS_ITEMS_SORTED_STORE) + self._register_temp_file_needed(config.CVS_ITEMS_SORTED_INDEX_TABLE) self._register_temp_file_needed(config.SYMBOL_DB) self._register_temp_file_needed(config.METADATA_DB) - self._register_temp_file_needed(config.CVS_REVS_SORTED_DATAFILE) + self._register_temp_file_needed(config.CHANGESETS_ALLBROKEN_STORE) + self._register_temp_file_needed(config.CHANGESETS_ALLBROKEN_INDEX) + self._register_temp_file_needed(config.CHANGESETS_SORTED_DATAFILE) + + def get_changesets(self): + """Generate (changeset,timestamp,) tuples in commit order.""" + + changeset_db = ChangesetDatabase( + artifact_manager.get_temp_file(config.CHANGESETS_ALLBROKEN_STORE), + artifact_manager.get_temp_file(config.CHANGESETS_ALLBROKEN_INDEX), + DB_OPEN_READ) + + for line in file( + artifact_manager.get_temp_file( + config.CHANGESETS_SORTED_DATAFILE)): + [changeset_id, timestamp] = [int(s, 16) for s in line.strip().split()] + yield (changeset_db[changeset_id], timestamp) + + changeset_db.close() + + def get_svn_commits(self): + """Generate the SVNCommits, in order.""" + + creator = SVNCommitCreator() + for (changeset, timestamp) in self.get_changesets(): + for svn_commit in creator.process_changeset(changeset, timestamp): + yield svn_commit + + def log_svn_commit(self, svn_commit): + """Output information about SVN_COMMIT.""" + + Log().normal("Creating Subversion r%d (%s)" + % (svn_commit.revnum, svn_commit.description)) + + if isinstance(svn_commit, SVNRevisionCommit): + for cvs_rev in svn_commit.cvs_revs: + Log().verbose(' %s %s' % (cvs_rev.cvs_path, cvs_rev.rev,)) def run(self, stats_keeper): Log().quiet("Mapping CVS revisions to Subversion commits...") @@ -322,24 +1288,29 @@ class AggregateRevsPass(Pass): Ctx()._cvs_file_db = CVSFileDatabase(DB_OPEN_READ) Ctx()._symbol_db = SymbolDatabase() Ctx()._metadata_db = MetadataDatabase(DB_OPEN_READ) - Ctx()._cvs_items_db = OldIndexedCVSItemStore( - artifact_manager.get_temp_file(config.CVS_ITEMS_RESYNC_STORE), - artifact_manager.get_temp_file(config.CVS_ITEMS_RESYNC_INDEX_TABLE)) - if not Ctx().trunk_only: + Ctx()._cvs_items_db = IndexedCVSItemStore( + artifact_manager.get_temp_file(config.CVS_ITEMS_SORTED_STORE), + artifact_manager.get_temp_file(config.CVS_ITEMS_SORTED_INDEX_TABLE), + DB_OPEN_READ) + Ctx()._symbolings_logger = SymbolingsLogger() - aggregator = CVSRevisionAggregator() - for line in file( - artifact_manager.get_temp_file(config.CVS_REVS_SORTED_DATAFILE)): - cvs_rev_id = int(line.strip().split()[-1], 16) - cvs_rev = Ctx()._cvs_items_db[cvs_rev_id] - if not (Ctx().trunk_only and isinstance(cvs_rev.lod, Branch)): - aggregator.process_revision(cvs_rev) - aggregator.flush() - if not Ctx().trunk_only: + + persistence_manager = PersistenceManager(DB_OPEN_NEW) + + for svn_commit in self.get_svn_commits(): + self.log_svn_commit(svn_commit) + persistence_manager.put_svn_commit(svn_commit) + + persistence_manager.close() Ctx()._symbolings_logger.close() Ctx()._cvs_items_db.close() + Ctx()._metadata_db.close() + Ctx()._symbol_db.close() + Ctx()._cvs_file_db.close() + stats_keeper.set_svn_rev_count(SVNCommit.revnum - 1) stats_keeper.archive() + Log().quiet("Done") @@ -347,14 +1318,12 @@ class SortSymbolsPass(Pass): """This pass was formerly known as pass6.""" def register_artifacts(self): - if not Ctx().trunk_only: self._register_temp_file(config.SYMBOL_OPENINGS_CLOSINGS_SORTED) self._register_temp_file_needed(config.SYMBOL_OPENINGS_CLOSINGS) def run(self, stats_keeper): Log().quiet("Sorting symbolic name source revisions...") - if not Ctx().trunk_only: sort_file( artifact_manager.get_temp_file(config.SYMBOL_OPENINGS_CLOSINGS), artifact_manager.get_temp_file( @@ -367,7 +1336,6 @@ class IndexSymbolsPass(Pass): """This pass was formerly known as pass7.""" def register_artifacts(self): - if not Ctx().trunk_only: self._register_temp_file(config.SYMBOL_OFFSETS_DB) self._register_temp_file_needed(config.SYMBOL_DB) self._register_temp_file_needed(config.SYMBOL_OPENINGS_CLOSINGS_SORTED) @@ -399,6 +1367,8 @@ class IndexSymbolsPass(Pass): old_id = id offsets[id] = fpos + f.close() + offsets_db = file( artifact_manager.get_temp_file(config.SYMBOL_OFFSETS_DB), 'wb') cPickle.dump(offsets, offsets_db, -1) @@ -406,10 +1376,9 @@ class IndexSymbolsPass(Pass): def run(self, stats_keeper): Log().quiet("Determining offsets for all symbolic names...") - - if not Ctx().trunk_only: Ctx()._symbol_db = SymbolDatabase() self.generate_offsets_for_symbolings() + Ctx()._symbol_db.close() Log().quiet("Done.") @@ -417,35 +1386,27 @@ class OutputPass(Pass): """This pass was formerly known as pass8.""" def register_artifacts(self): - self._register_temp_file(config.SVN_MIRROR_REVISIONS_DB) - self._register_temp_file(config.SVN_MIRROR_NODES_DB) + self._register_temp_file(config.SVN_MIRROR_REVISIONS_TABLE) + self._register_temp_file(config.SVN_MIRROR_NODES_INDEX_TABLE) + self._register_temp_file(config.SVN_MIRROR_NODES_STORE) self._register_temp_file_needed(config.CVS_FILES_DB) - self._register_temp_file_needed(config.CVS_ITEMS_RESYNC_STORE) - self._register_temp_file_needed(config.CVS_ITEMS_RESYNC_INDEX_TABLE) + self._register_temp_file_needed(config.CVS_ITEMS_SORTED_STORE) + self._register_temp_file_needed(config.CVS_ITEMS_SORTED_INDEX_TABLE) self._register_temp_file_needed(config.SYMBOL_DB) self._register_temp_file_needed(config.METADATA_DB) - self._register_temp_file_needed(config.SVN_COMMITS_DB) + self._register_temp_file_needed(config.SVN_COMMITS_INDEX_TABLE) + self._register_temp_file_needed(config.SVN_COMMITS_STORE) self._register_temp_file_needed(config.CVS_REVS_TO_SVN_REVNUMS) - if not Ctx().trunk_only: self._register_temp_file_needed(config.SYMBOL_OPENINGS_CLOSINGS_SORTED) self._register_temp_file_needed(config.SYMBOL_OFFSETS_DB) + Ctx().revision_reader.register_artifacts(self) - def run(self, stats_keeper): - Ctx()._cvs_file_db = CVSFileDatabase(DB_OPEN_READ) - Ctx()._metadata_db = MetadataDatabase(DB_OPEN_READ) - Ctx()._cvs_items_db = OldIndexedCVSItemStore( - artifact_manager.get_temp_file(config.CVS_ITEMS_RESYNC_STORE), - artifact_manager.get_temp_file(config.CVS_ITEMS_RESYNC_INDEX_TABLE)) - if not Ctx().trunk_only: - Ctx()._symbol_db = SymbolDatabase() - repos = SVNRepositoryMirror() - persistence_manager = PersistenceManager(DB_OPEN_READ) - - Ctx().output_option.setup(repos) + def get_svn_commits(self): + """Generate the SVNCommits in commit order.""" - repos.add_delegate(StdoutDelegate(stats_keeper.svn_rev_count())) + persistence_manager = PersistenceManager(DB_OPEN_READ) - svn_revnum = 2 # Repository initialization is 1. + svn_revnum = 2 # The first non-trivial commit # Peek at the first revision to find the date to use to initialize # the repository: @@ -453,19 +1414,69 @@ class OutputPass(Pass): # Initialize the repository by creating the directories for trunk, # tags, and branches. - SVNInitialProjectCommit(svn_commit.date, 1).commit(repos) + yield SVNInitialProjectCommit(svn_commit.date, 1) - while True: + while svn_commit: + yield svn_commit + svn_revnum += 1 svn_commit = persistence_manager.get_svn_commit(svn_revnum) - if not svn_commit: - break + + persistence_manager.close() + + def run(self, stats_keeper): + Ctx()._cvs_file_db = CVSFileDatabase(DB_OPEN_READ) + Ctx()._metadata_db = MetadataDatabase(DB_OPEN_READ) + Ctx()._cvs_items_db = IndexedCVSItemStore( + artifact_manager.get_temp_file(config.CVS_ITEMS_SORTED_STORE), + artifact_manager.get_temp_file(config.CVS_ITEMS_SORTED_INDEX_TABLE), + DB_OPEN_READ) + Ctx()._symbol_db = SymbolDatabase() + repos = SVNRepositoryMirror() + + Ctx().output_option.setup(repos) + + repos.add_delegate(StdoutDelegate(stats_keeper.svn_rev_count())) + + Ctx().revision_reader.start() + + for svn_commit in self.get_svn_commits(): svn_commit.commit(repos) - svn_revnum += 1 - repos.finish() + repos.close() - Ctx().output_option.cleanup() + Ctx().revision_reader.finish() + Ctx().output_option.cleanup() + Ctx()._symbol_db.close() Ctx()._cvs_items_db.close() + Ctx()._metadata_db.close() + Ctx()._cvs_file_db.close() + + +# The list of passes constituting a run of cvs2svn: +passes = [ + CollectRevsPass(), + CollateSymbolsPass(), + #CheckItemStoreDependenciesPass(config.CVS_ITEMS_STORE), + FilterSymbolsPass(), + #CheckIndexedItemStoreDependenciesPass( + # config.CVS_ITEMS_FILTERED_STORE, + # config.CVS_ITEMS_FILTERED_INDEX_TABLE), + SortRevisionSummaryPass(), + SortSymbolSummaryPass(), + InitializeChangesetsPass(), + #CheckIndexedItemStoreDependenciesPass( + # config.CVS_ITEMS_SORTED_STORE, + # config.CVS_ITEMS_SORTED_INDEX_TABLE), + BreakRevisionChangesetCyclesPass(), + RevisionTopologicalSortPass(), + BreakSymbolChangesetCyclesPass(), + BreakAllChangesetCyclesPass(), + TopologicalSortPass(), + CreateRevsPass(), + SortSymbolsPass(), + IndexSymbolsPass(), + OutputPass(), + ] diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/persistence_manager.py cvs2svn-2.0.0/cvs2svn_lib/persistence_manager.py --- cvs2svn-1.5.x/cvs2svn_lib/persistence_manager.py 2006-09-10 16:36:26.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/persistence_manager.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -17,24 +17,25 @@ """This module contains class PersistenceManager.""" +import bisect + from cvs2svn_lib.boolean import * from cvs2svn_lib import config +from cvs2svn_lib.common import DB_OPEN_NEW +from cvs2svn_lib.common import DB_OPEN_READ from cvs2svn_lib.common import SVN_INVALID_REVNUM from cvs2svn_lib.log import Log from cvs2svn_lib.context import Ctx from cvs2svn_lib.artifact_manager import artifact_manager -from cvs2svn_lib.database import Database -from cvs2svn_lib.database import PrimedPDatabase -from cvs2svn_lib.database import DB_OPEN_NEW -from cvs2svn_lib.database import DB_OPEN_READ -from cvs2svn_lib.svn_commit import SVNCommit +from cvs2svn_lib.record_table import SignedIntegerPacker +from cvs2svn_lib.record_table import RecordTable +from cvs2svn_lib.serializer import PrimedPickleSerializer +from cvs2svn_lib.database import IndexedDatabase from cvs2svn_lib.svn_commit import SVNRevisionCommit from cvs2svn_lib.svn_commit import SVNInitialProjectCommit from cvs2svn_lib.svn_commit import SVNPrimaryCommit from cvs2svn_lib.svn_commit import SVNSymbolCommit -from cvs2svn_lib.svn_commit import SVNPreCommit from cvs2svn_lib.svn_commit import SVNPostCommit -from cvs2svn_lib.svn_commit import SVNSymbolCloseCommit class PersistenceManager: @@ -56,57 +57,49 @@ class PersistenceManager: self.mode = mode if mode not in (DB_OPEN_NEW, DB_OPEN_READ): raise RuntimeError, "Invalid 'mode' argument to PersistenceManager" - self.svn_commit_db = PrimedPDatabase( - artifact_manager.get_temp_file(config.SVN_COMMITS_DB), mode, - (SVNInitialProjectCommit, SVNPrimaryCommit, SVNSymbolCommit, - SVNPreCommit, SVNPostCommit, SVNSymbolCloseCommit,)) - self.cvs2svn_db = Database( - artifact_manager.get_temp_file(config.CVS_REVS_TO_SVN_REVNUMS), mode) - - # branch_id -> svn_revnum in which branch was last filled. This - # is used by CVSCommit._pre_commit, to prevent creating a fill - # revision which would have nothing to do. The record with index - # None reflects the svn revision of the last SVNPostCommit. - self.last_filled = {} + primer = (SVNInitialProjectCommit, SVNPrimaryCommit, SVNSymbolCommit, + SVNPostCommit,) + serializer = PrimedPickleSerializer(primer) + self.svn_commit_db = IndexedDatabase( + artifact_manager.get_temp_file(config.SVN_COMMITS_INDEX_TABLE), + artifact_manager.get_temp_file(config.SVN_COMMITS_STORE), + mode, serializer) + self.cvs2svn_db = RecordTable( + artifact_manager.get_temp_file(config.CVS_REVS_TO_SVN_REVNUMS), + mode, SignedIntegerPacker(SVN_INVALID_REVNUM)) def get_svn_revnum(self, cvs_rev_id): """Return the Subversion revision number in which CVS_REV_ID was committed, or SVN_INVALID_REVNUM if there is no mapping for CVS_REV_ID.""" - return int(self.cvs2svn_db.get('%x' % (cvs_rev_id,), SVN_INVALID_REVNUM)) + return self.cvs2svn_db.get(cvs_rev_id, SVN_INVALID_REVNUM) def get_svn_commit(self, svn_revnum): """Return an SVNCommit that corresponds to SVN_REVNUM. - If no SVNCommit exists for revnum SVN_REVNUM, then return None. - - This method can throw SVNCommitInternalInconsistencyError.""" + If no SVNCommit exists for revnum SVN_REVNUM, then return None.""" - return self.svn_commit_db.get('%x' % svn_revnum, None) + return self.svn_commit_db.get(svn_revnum, None) def put_svn_commit(self, svn_commit): """Record the bidirectional mapping between SVN_REVNUM and CVS_REVS and record associated attributes.""" - Log().normal("Creating Subversion r%d (%s)" - % (svn_commit.revnum, svn_commit.description)) - if self.mode == DB_OPEN_READ: raise RuntimeError, \ 'Write operation attempted on read-only PersistenceManager' - self.svn_commit_db['%x' % svn_commit.revnum] = svn_commit + self.svn_commit_db[svn_commit.revnum] = svn_commit if isinstance(svn_commit, SVNRevisionCommit): for cvs_rev in svn_commit.cvs_revs: - Log().verbose(' %s %s' % (cvs_rev.cvs_path, cvs_rev.rev,)) - self.cvs2svn_db['%x' % cvs_rev.id] = svn_commit.revnum + self.cvs2svn_db[cvs_rev.id] = svn_commit.revnum - # If it is a symbol commit, then record last_filled. - if isinstance(svn_commit, SVNSymbolCommit): - self.last_filled[svn_commit.symbol.id] = svn_commit.revnum - elif isinstance(svn_commit, SVNPostCommit): - self.last_filled[None] = svn_commit.revnum + def close(self): + self.cvs2svn_db.close() + self.cvs2svn_db = None + self.svn_commit_db.close() + self.svn_commit_db = None diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/primed_pickle.py cvs2svn-2.0.0/cvs2svn_lib/primed_pickle.py --- cvs2svn-1.5.x/cvs2svn_lib/primed_pickle.py 2006-09-14 22:39:51.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/primed_pickle.py 1970-01-01 01:00:00.000000000 +0100 @@ -1,111 +0,0 @@ -# (Be in -*- python -*- mode.) -# -# ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. -# -# This software is licensed as described in the file COPYING, which -# you should have received as part of this distribution. The terms -# are also available at http://subversion.tigris.org/license-1.html. -# If newer versions of this license are posted there, you may use a -# newer version instead, at your option. -# -# This software consists of voluntary contributions made by many -# individuals. For exact contribution history, see the revision -# history and logs, available at http://cvs2svn.tigris.org/. -# ==================================================================== - -"""Picklers and unpicklers that are primed with known objects.""" - -from __future__ import generators - -import cStringIO -import cPickle - -from cvs2svn_lib.boolean import * - - -def get_memos(primer): - """Return a tuple (pickler_memo, unpickler_memo,) for primer. - - These memos can be used to create picklers and unpicklers, - respectively, that are 'pre-trained' to recognize the objects that - are in PRIMER. Note that the memos needed for pickling and - unpickling are different.""" - - f = cStringIO.StringIO() - pickler = cPickle.Pickler(f, -1) - pickler.dump(primer) - unpickler = cPickle.Unpickler(cStringIO.StringIO(f.getvalue())) - unpickler.load() - return pickler.memo, unpickler.memo - - -class PrimedPickler: - """This class acts as a pickler with a pre-initialized memo. - - A new pickler is created for each call to dumpf or dumps, each time - with the memo initialize to self.memo.""" - - def __init__(self, memo): - """Prepare to make picklers with memos initialized to MEMO.""" - - self.memo = memo - - def create_pickler(self, f): - """Return a new pickler with its memo initialized to SELF.memo.""" - - pickler = cPickle.Pickler(f, -1) - pickler.memo = self.memo.copy() - return pickler - - def dumpf(self, f, object): - """Pickle OBJECT to file-like object F. - - A new pickler, initialized with SELF.memo, is used for each call - to this method.""" - - self.create_pickler(f).dump(object) - - def dumps(self, object): - """Return a string containing OBJECT in pickled form. - - A new pickler, initialized with SELF.memo, is used for each call - to this method.""" - - f = cStringIO.StringIO() - self.create_pickler(f).dump(object) - return f.getvalue() - - -class PrimedUnpickler: - """This class acts as an unpickler with a pre-initialized memo.""" - - def __init__(self, memo): - """Prepare to make picklers with memos initialized to MEMO.""" - - self.memo = memo - - def create_unpickler(self, f): - """Return a new unpickler with its memo initialized to SELF.memo.""" - - unpickler = cPickle.Unpickler(f) - unpickler.memo = self.memo.copy() - return unpickler - - def loadf(self, f): - """Return the next object unpickled from file-like object F. - - A new unpickler, initialized with SELF.memo, is used for each call - to this method.""" - - return self.create_unpickler(f).load() - - def loads(self, s): - """Return the object unpickled from string S. - - A new unpickler, initialized with SELF.memo, is used for each call - to this method.""" - - return self.create_unpickler(cStringIO.StringIO(s)).load() - - diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/process.py cvs2svn-2.0.0/cvs2svn_lib/process.py --- cvs2svn-1.5.x/cvs2svn_lib/process.py 2007-01-26 17:55:36.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/process.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/project.py cvs2svn-2.0.0/cvs2svn_lib/project.py --- cvs2svn-1.5.x/cvs2svn_lib/project.py 2006-10-03 12:17:51.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/project.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -25,11 +25,8 @@ from cvs2svn_lib.boolean import * from cvs2svn_lib.context import Ctx from cvs2svn_lib.common import path_join from cvs2svn_lib.common import path_split -from cvs2svn_lib.common import error_prefix from cvs2svn_lib.common import FatalError from cvs2svn_lib.log import Log -from cvs2svn_lib.cvs_repository import CVSRepositoryViaCVS -from cvs2svn_lib.cvs_repository import CVSRepositoryViaRCS from cvs2svn_lib.cvs_file import CVSFile @@ -40,7 +37,13 @@ def verify_paths_disjoint(*paths): that 'a/b/c/d' is nested in 'a/b'), or any two paths are identical, write an error message and exit.""" - paths = [(path.split('/'), path) for path in paths] + def split(path): + if not path: + return [] + else: + return path.split('/') + + paths = [(split(path), path) for path in paths] # If all overlapping elements are equal, a shorter list is # considered "less than" a longer one. Therefore if any paths are # nested, this sort will leave at least one such pair adjacent, in @@ -51,40 +54,52 @@ def verify_paths_disjoint(*paths): split_path2, path2 = paths[i] if len(split_path1) <= len(split_path2) \ and split_path2[:len(split_path1)] == split_path1: - raise FatalError("paths %s and %s are not disjoint." % (path1, path2,)) + raise FatalError( + 'paths "%s" and "%s" are not disjoint.' % (path1, path2,) + ) -def normalize_ttb_path(opt, path): +def normalize_ttb_path(opt, path, allow_empty=False): """Normalize a path to be used for --trunk, --tags, or --branches. 1. Strip leading, trailing, and duplicated '/'. - 2. Verify that the path is not empty. + 2. If ALLOW_EMPTY is not set, verify that PATH is not empty. Return the normalized path. If the path is invalid, write an error message and exit.""" norm_path = path_join(*path.split('/')) - if not norm_path: + if not allow_empty and not norm_path: raise FatalError("cannot pass an empty path to %s." % (opt,)) return norm_path -OS_SEP_PLUS_ATTIC = os.sep + 'Attic' +class FileInAndOutOfAtticException(Exception): + def __init__(self, non_attic_path, attic_path): + Exception.__init__( + self, + "A CVS repository cannot contain both %s and %s" + % (non_attic_path, attic_path)) + self.non_attic_path = non_attic_path + self.attic_path = attic_path -class Project: + +class Project(object): """A project within a CVS repository.""" def __init__(self, project_cvs_repos_path, - trunk_path, branches_path, tags_path, + trunk_path, branches_path=None, tags_path=None, symbol_transforms=None): """Create a new Project record. PROJECT_CVS_REPOS_PATH is the main CVS directory for this project (within the filesystem). TRUNK_PATH, BRANCHES_PATH, and TAGS_PATH are the full, normalized directory names in svn for the - corresponding part of the repository. + corresponding part of the repository. (BRANCHES_PATH and + TAGS_PATH do not have to be specified for a --trunk-only + conversion.) SYMBOL_TRANSFORMS is a list of SymbolTransform instances which will be used to transform any symbol names within this project.""" @@ -92,29 +107,34 @@ class Project: # A unique id for this project, also used as its index in # Ctx().projects. This field is filled in by Ctx.add_project(). self.id = None - self.project_cvs_repos_path = os.path.normpath(project_cvs_repos_path) - if Ctx().use_cvs: - self.cvs_repository = CVSRepositoryViaCVS(self.project_cvs_repos_path) - else: - self.cvs_repository = CVSRepositoryViaRCS(self.project_cvs_repos_path) + self.project_cvs_repos_path = os.path.normpath(project_cvs_repos_path) + if not os.path.isdir(self.project_cvs_repos_path): + raise FatalError("The specified CVS repository path '%s' is not an " + "existing directory." % self.project_cvs_repos_path) + + self.cvs_repository_root, self.cvs_module = \ + self.determine_repository_root( + os.path.abspath(self.project_cvs_repos_path)) # A regexp matching project_cvs_repos_path plus an optional separator: self.project_prefix_re = re.compile( r'^' + re.escape(self.project_cvs_repos_path) + r'(' + re.escape(os.sep) + r'|$)') - # The project's main directory as a cvs_path: - self.project_cvs_path = \ - self.project_cvs_repos_path[len(self.cvs_repository.cvs_repos_path):] - if self.project_cvs_path.startswith(os.sep): - self.project_cvs_path = self.project_cvs_path[1:] - - self.trunk_path = normalize_ttb_path('--trunk', trunk_path) + self.trunk_path = normalize_ttb_path( + '--trunk', trunk_path, allow_empty=Ctx().trunk_only + ) + if Ctx().trunk_only: + self._unremovable_paths = [self.trunk_path] + else: self.branches_path = normalize_ttb_path('--branches', branches_path) self.tags_path = normalize_ttb_path('--tags', tags_path) - verify_paths_disjoint(self.trunk_path, self.branches_path, self.tags_path) + verify_paths_disjoint( + self.trunk_path, self.branches_path, self.tags_path + ) self._unremovable_paths = [ - self.trunk_path, self.branches_path, self.tags_path] + self.trunk_path, self.branches_path, self.tags_path + ] # A list of transformation rules (regexp, replacement) applied to # symbol names in this project. @@ -123,12 +143,86 @@ class Project: else: self.symbol_transforms = symbol_transforms + def __eq__(self, other): + return self.id == other.id + def __cmp__(self, other): - return cmp(self.id, other.id) + return cmp(self.cvs_module, other.cvs_module) \ + or cmp(self.id, other.id) def __hash__(self): return self.id + def determine_repository_root(path): + """Ascend above the specified PATH if necessary to find the + cvs_repository_root (a directory containing a CVSROOT directory) + and the cvs_module (the path of the conversion root within the cvs + repository). Return the root path and the module path of this + project relative to the root. + + NB: cvs_module must be seperated by '/', *not* by os.sep.""" + + def is_cvs_repository_root(path): + return os.path.isdir(os.path.join(path, 'CVSROOT')) + + original_path = path + cvs_module = '' + while not is_cvs_repository_root(path): + # Step up one directory: + prev_path = path + path, module_component = os.path.split(path) + if path == prev_path: + # Hit the root (of the drive, on Windows) without finding a + # CVSROOT dir. + raise FatalError( + "the path '%s' is not a CVS repository, nor a path " + "within a CVS repository. A CVS repository contains " + "a CVSROOT directory within its root directory." + % (original_path,)) + + cvs_module = module_component + "/" + cvs_module + + return path, cvs_module + + determine_repository_root = staticmethod(determine_repository_root) + + ctrl_characters_regexp = re.compile('[\\\x00-\\\x1f\\\x7f]') + + def verify_filename_legal(path, filename): + """Verify that FILENAME is a legal filename. + + FILENAME is a path component of a CVS path. Check that it won't + choke SVN: + + - Check that it is not empty. + + - Check that it is not equal to '.' or '..'. + + - Check that the filename does not include any control characters. + + If any of these tests fail, raise a FatalError. PATH is the full + filesystem path from which FILENAME was derived; it can be used in + error messages.""" + + if filename == '': + raise FatalError( + "File %s would result in an empty filename." % (path,) + ) + + if filename in ['.', '..']: + raise FatalError( + "File %s would result in an illegal filename '%s'." + % (path, filename,) + ) + + m = Project.ctrl_characters_regexp.search(filename) + if m: + raise FatalError( + "Character %r in filename %r is not supported by Subversion." + % (m.group(), filename,)) + + verify_filename_legal = staticmethod(verify_filename_legal) + def _get_cvs_path(self, filename): """Return the path to FILENAME relative to project_cvs_repos_path. @@ -146,23 +240,44 @@ class Project: tail = tail[:-2] return tail.replace(os.sep, '/') - def get_cvs_file(self, filename): + def is_file_in_attic(self, filename): + """Return True iff FILENAME is in an Attic subdirectory. + + FILENAME is the filesystem path to a '*,v' file.""" + + dirname = os.path.dirname(filename) + (dirname2, basename2,) = os.path.split(dirname) + return basename2 == 'Attic' and self.project_prefix_re.match(dirname2) + + def get_cvs_file(self, filename, leave_in_attic=False): """Return a CVSFile describing the file with name FILENAME. FILENAME must be a *,v file within this project. The CVSFile is assigned a new unique id. All of the CVSFile information is filled in except mode (which can only be determined by parsing the - file).""" + file). + + If LEAVE_IN_ATTIC is True, then leave the 'Attic' component in the + filename. Otherwise, raise FileInAndOutOfAtticException if the + file is in Attic, and a file with the same filename appears + outside of Attic. + Raise FatalError if the resulting filename would not be legal in + SVN.""" + + self.verify_filename_legal(filename, os.path.basename(filename)[:-2]) + + if leave_in_attic or not self.is_file_in_attic(filename): + canonical_filename = filename + else: (dirname, basename,) = os.path.split(filename) - if dirname.endswith(OS_SEP_PLUS_ATTIC): + # If this file also exists outside of the attic, it's a fatal error + non_attic_filename = os.path.join(os.path.dirname(dirname), basename) + if os.path.exists(non_attic_filename): + raise FileInAndOutOfAtticException(non_attic_filename, filename) + # drop the 'Attic' portion from the filename for the canonical name: - canonical_filename = os.path.join( - dirname[:-len(OS_SEP_PLUS_ATTIC)], basename) - file_in_attic = True - else: - canonical_filename = filename - file_in_attic = False + canonical_filename = non_attic_filename file_stat = os.stat(filename) @@ -175,14 +290,16 @@ class Project: # mode is not known, so we temporarily set it to None. return CVSFile( None, self, filename, self._get_cvs_path(canonical_filename), - file_in_attic, file_executable, file_size, None + file_executable, file_size, None ) def is_source(self, svn_path): """Return True iff SVN_PATH is a legitimate source for this project. Legitimate paths are self.trunk_path or any directory directly - under self.branches_path.""" + under self.branches_path. + + This routine must not be called during --trunk-only conversions.""" if svn_path == self.trunk_path: return True @@ -198,43 +315,34 @@ class Project: return svn_path in self._unremovable_paths - def get_branch_path(self, branch_symbol): - """Return the svnpath for BRANCH_SYMBOL.""" - - return path_join(self.branches_path, branch_symbol.get_clean_name()) + def get_trunk_path(self, *components): + """Return the trunk path. - def get_tag_path(self, tag_symbol): - """Return the svnpath for TAG_SYMBOL.""" + Also append any cvs path components from COMPONENTS.""" - return path_join(self.tags_path, tag_symbol.get_clean_name()) + return path_join(self.trunk_path, *components) - def _relative_name(self, cvs_path): - """Convert CVS_PATH into a name relative to this project's root directory. + def get_branch_path(self, branch_symbol, *components): + """Return the svnpath for BRANCH_SYMBOL. - CVS_PATH has to begin (textually) with self.project_cvs_path. - Remove prefix and optional '/'.""" + Also append any cvs path components from COMPONENTS. - if not cvs_path.startswith(self.project_cvs_path): - raise FatalError( - "_relative_name: '%s' is not a sub-path of '%s'" - % (cvs_path, self.project_cvs_path,)) - l = len(self.project_cvs_path) - if cvs_path[l] == os.sep: - l += 1 - return cvs_path[l:] + This routine must not be called during --trunk-only conversions.""" - def make_trunk_path(self, cvs_path): - """Return the trunk path for CVS_PATH. + return path_join( + self.branches_path, branch_symbol.get_clean_name(), *components + ) - Return the svn path for this file on trunk.""" + def get_tag_path(self, tag_symbol, *components): + """Return the svnpath for TAG_SYMBOL. - return path_join(self.trunk_path, self._relative_name(cvs_path)) + Also append any cvs path components from COMPONENTS. - def make_branch_path(self, branch_symbol, cvs_path): - """Return the svn path for CVS_PATH on branch BRANCH_SYMBOL.""" + This routine must not be called during --trunk-only conversions.""" - return path_join(self.get_branch_path(branch_symbol), - self._relative_name(cvs_path)) + return path_join( + self.tags_path, tag_symbol.get_clean_name(), *components + ) def transform_symbol(self, cvs_file, name): """Transform the symbol NAME using the renaming rules specified diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/property_setters.py cvs2svn-2.0.0/cvs2svn_lib/property_setters.py --- cvs2svn-1.5.x/cvs2svn_lib/property_setters.py 2006-09-10 16:36:26.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/property_setters.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -31,7 +31,10 @@ class SVNPropertySetter: """Abstract class for objects that can set properties on a SVNCommitItem.""" def set_properties(self, s_item): - """Set any properties that can be determined for S_ITEM.""" + """Set any properties that can be determined for S_ITEM. + + S_ITEM is an instance of SVNCommitItem. This method should modify + S_ITEM.svn_props in place.""" raise NotImplementedError @@ -39,30 +42,47 @@ class SVNPropertySetter: class CVSRevisionNumberSetter(SVNPropertySetter): """Set the cvs2svn:cvs-rev property to the CVS revision number.""" + propname = 'cvs2svn:cvs-rev' + def set_properties(self, s_item): - s_item.svn_props['cvs2svn:cvs-rev'] = s_item.cvs_rev.rev + if self.propname in s_item.svn_props: + return + + s_item.svn_props[self.propname] = s_item.cvs_rev.rev s_item.svn_props_changed = True class ExecutablePropertySetter(SVNPropertySetter): """Set the svn:executable property based on cvs_rev.cvs_file.executable.""" + propname = 'svn:executable' + def set_properties(self, s_item): + if self.propname in s_item.svn_props: + return + if s_item.cvs_rev.cvs_file.executable: - s_item.svn_props['svn:executable'] = '*' + s_item.svn_props[self.propname] = '*' + +class CVSBinaryFileEOLStyleSetter(SVNPropertySetter): + """Set the eol-style to None for files with CVS mode '-kb'.""" -class BinaryFileEOLStyleSetter(SVNPropertySetter): - """Set the eol-style for binary files to None.""" + propname = 'svn:eol-style' def set_properties(self, s_item): + if self.propname in s_item.svn_props: + return + if s_item.cvs_rev.cvs_file.mode == 'b': - s_item.svn_props['svn:eol-style'] = None + s_item.svn_props[self.propname] = None class MimeMapper(SVNPropertySetter): """A class that provides mappings from file names to MIME types.""" + propname = 'svn:mime-type' + def __init__(self, mime_types_file): self.mappings = { } @@ -83,6 +103,9 @@ class MimeMapper(SVNPropertySetter): self.mappings[ext] = type def set_properties(self, s_item): + if self.propname in s_item.svn_props: + return + basename, extension = os.path.splitext( os.path.basename(s_item.cvs_rev.cvs_path) ) @@ -99,7 +122,7 @@ class MimeMapper(SVNPropertySetter): mime_type = self.mappings.get(extension, None) if mime_type is not None: - s_item.svn_props['svn:mime-type'] = mime_type + s_item.svn_props[self.propname] = mime_type class AutoPropsPropertySetter(SVNPropertySetter): @@ -129,7 +152,7 @@ class AutoPropsPropertySetter(SVNPropert return fnmatch.fnmatch(basename, self.pattern) - def __init__(self, configfilename, ignore_case): + def __init__(self, configfilename, ignore_case=True): config = ConfigParser.ConfigParser() if ignore_case: self.transform_case = self.squash_case @@ -193,14 +216,18 @@ class AutoPropsPropertySetter(SVNPropert s_item.svn_props[k] = v -class BinaryFileDefaultMimeTypeSetter(SVNPropertySetter): +class CVSBinaryFileDefaultMimeTypeSetter(SVNPropertySetter): """If the file is binary and its svn:mime-type property is not yet set, set it to 'application/octet-stream'.""" + propname = 'svn:mime-type' + def set_properties(self, s_item): - if 'svn:mime-type' not in s_item.svn_props \ - and s_item.cvs_rev.cvs_file.mode == 'b': - s_item.svn_props['svn:mime-type'] = 'application/octet-stream' + if self.propname in s_item.svn_props: + return + + if s_item.cvs_rev.cvs_file.mode == 'b': + s_item.svn_props[self.propname] = 'application/octet-stream' class EOLStyleFromMimeTypeSetter(SVNPropertySetter): @@ -211,32 +238,55 @@ class EOLStyleFromMimeTypeSetter(SVNProp starts with 'text/', then set svn:eol-style to native; otherwise, force it to remain unset. See also issue #39.""" + propname = 'svn:eol-style' + def set_properties(self, s_item): - if 'svn:eol-style' not in s_item.svn_props \ - and s_item.svn_props.get('svn:mime-type', None) is not None: + if self.propname in s_item.svn_props: + return + + if s_item.svn_props.get('svn:mime-type', None) is not None: if s_item.svn_props['svn:mime-type'].startswith("text/"): - s_item.svn_props['svn:eol-style'] = 'native' + s_item.svn_props[self.propname] = 'native' else: - s_item.svn_props['svn:eol-style'] = None + s_item.svn_props[self.propname] = None class DefaultEOLStyleSetter(SVNPropertySetter): """Set the eol-style if one has not already been set.""" + propname = 'svn:eol-style' + def __init__(self, value): """Initialize with the specified default VALUE.""" self.value = value def set_properties(self, s_item): - if 'svn:eol-style' not in s_item.svn_props: - s_item.svn_props['svn:eol-style'] = self.value + if self.propname in s_item.svn_props: + return + + s_item.svn_props[self.propname] = self.value + + +class SVNBinaryFileKeywordsPropertySetter(SVNPropertySetter): + """Turn off svn:keywords for files with binary svn:eol-style.""" + + propname = 'svn:keywords' + + def set_properties(self, s_item): + if self.propname in s_item.svn_props: + return + + if not s_item.svn_props.get('svn:eol-style'): + s_item.svn_props[self.propname] = None class KeywordsPropertySetter(SVNPropertySetter): """If the svn:keywords property is not yet set, set it based on the file's mode. See issue #2.""" + propname = 'svn:keywords' + def __init__(self, value): """Use VALUE for the value of the svn:keywords property if it is to be set.""" @@ -244,8 +294,10 @@ class KeywordsPropertySetter(SVNProperty self.value = value def set_properties(self, s_item): - if 'svn:keywords' not in s_item.svn_props \ - and s_item.cvs_rev.cvs_file.mode in [None, 'kv', 'kvl']: - s_item.svn_props['svn:keywords'] = self.value + if self.propname in s_item.svn_props: + return + + if s_item.cvs_rev.cvs_file.mode in [None, 'kv', 'kvl']: + s_item.svn_props[self.propname] = self.value diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/rcs_stream.py cvs2svn-2.0.0/cvs2svn_lib/rcs_stream.py --- cvs2svn-1.5.x/cvs2svn_lib/rcs_stream.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/rcs_stream.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,122 @@ +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""This module processes RCS diffs (deltas).""" + + +import re + +def msplit(s): + """Split S into an array of lines. + + Only \n is a line separator. The line endings are part of the lines.""" + + # return s.splitlines(True) clobbers \r + re = [ i + "\n" for i in s.split("\n") ] + re[-1] = re[-1][:-1] + if not re[-1]: + del re[-1] + return re + + +class RCSStream: + """This class represents a single file object to which RCS deltas can be + applied in various ways.""" + + ad_command = re.compile(r'^([ad])(\d+)\s(\d+)\n$') + a_command = re.compile(r'^a(\d+)\s(\d+)\n$') + + def __init__(self, text): + """Instantiate and initialize the file content with TEXT.""" + + self._texts = msplit(text) + + def get_text(self): + """Return the current file content.""" + + return "".join(self._texts) + + def apply_diff(self, diff): + """Apply the RCS diff DIFF to the current file content.""" + + ntexts = [] + ooff = 0 + diffs = msplit(diff) + i = 0 + while i < len(diffs): + admatch = self.ad_command.match(diffs[i]) + if not admatch: + raise RuntimeError, 'Error parsing diff commands' + i += 1 + sl = int(admatch.group(2)) + cn = int(admatch.group(3)) + if admatch.group(1) == 'd': # "d" - Delete command + ntexts += self._texts[ooff:sl - 1] + ooff = sl - 1 + cn + else: # "a" - Add command + ntexts += self._texts[ooff:sl] + diffs[i:i + cn] + ooff = sl + i += cn + self._texts = ntexts + self._texts[ooff:] + + def invert_diff(self, diff): + """Apply the RCS diff DIFF to the current file content and simultaneously + generate an RCS diff suitable for reverting the change.""" + + ntexts = [] + ooff = 0 + diffs = msplit(diff) + ndiffs = [] + adjust = 0 + i = 0 + while i < len(diffs): + admatch = self.ad_command.match(diffs[i]) + if not admatch: + raise RuntimeError, 'Error parsing diff commands' + i += 1 + sl = int(admatch.group(2)) + cn = int(admatch.group(3)) + if admatch.group(1) == 'd': # "d" - Delete command + # Handle substitution explicitly, as add must come after del + # (last add may end in no newline, so no command can follow). + if i < len(diffs): + amatch = self.a_command.match(diffs[i]) + else: + amatch = None + if amatch and int(amatch.group(1)) == sl - 1 + cn: + cn2 = int(amatch.group(2)) + i += 1 + ndiffs += ["d%d %d\na%d %d\n" % \ + (sl + adjust, cn2, sl - 1 + adjust + cn2, cn)] + \ + self._texts[sl - 1:sl - 1 + cn] + ntexts += self._texts[ooff:sl - 1] + diffs[i:i + cn2] + adjust += cn2 - cn + i += cn2 + else: + ndiffs += ["a%d %d\n" % (sl - 1 + adjust, cn)] + \ + self._texts[sl - 1:sl - 1 + cn] + ntexts += self._texts[ooff:sl - 1] + adjust -= cn + ooff = sl - 1 + cn + else: # "a" - Add command + ndiffs += ["d%d %d\n" % (sl + 1 + adjust, cn)] + ntexts += self._texts[ooff:sl] + diffs[i:i + cn] + ooff = sl + adjust += cn + i += cn + self._texts = ntexts + self._texts[ooff:] + return "".join(ndiffs) + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/record_table.py cvs2svn-2.0.0/cvs2svn_lib/record_table.py --- cvs2svn-1.5.x/cvs2svn_lib/record_table.py 2006-09-16 23:34:23.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/record_table.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -22,82 +22,420 @@ index sequence leave gaps in the data fi efficiency the indexes of existing records should be approximately continuous. -The two classes in this module are abstract. Deriving classes have to -specify how to pack records into strings and unpack strings into -records y overwriting the pack()/unpack() methods. Arbitrary records -can be written as long as they can be converted to fixed-length -strings. - -Note that these classes do not keep track of which records have been -written, aside from keeping track of the highest record number that -was ever written. If an unwritten record is read, then the unpack() -method will be passes a string containing only NUL characters.""" +To use a RecordTable, you need a class derived from Packer which can +serialize/deserialize your records into fixed-size strings. Deriving +classes have to specify how to pack records into strings and unpack +strings into records by overwriting the pack() and unpack() methods +respectively. + +Note that these classes keep track of gaps in the records that have +been written by filling them with packer.empty_value. If a record is +read which contains packer.empty_value, then a KeyError is raised.""" from __future__ import generators import os +import types +import struct +import mmap from cvs2svn_lib.boolean import * +from cvs2svn_lib.common import DB_OPEN_READ +from cvs2svn_lib.common import DB_OPEN_WRITE +from cvs2svn_lib.common import DB_OPEN_NEW +from cvs2svn_lib.log import Log -class NewRecordTable: - def __init__(self, filename, record_len): - self.f = open(filename, 'wb+') +# A unique value that can be used to stand for "unset" without +# preventing the use of None. +_unset = object() + + +class Packer(object): + def __init__(self, record_len, empty_value=None): self.record_len = record_len - self.max_memory_cache = 128 * 1024 / self.record_len - self.cache = {} + if empty_value is None: + self.empty_value = '\0' * self.record_len + else: + assert type(empty_value) is types.StringType + assert len(empty_value) == self.record_len + self.empty_value = empty_value + + def pack(self, v): + """Pack record V into a string of length self.record_len.""" + + raise NotImplementedError() + + def unpack(self, s): + """Unpack string S into a record.""" + + raise NotImplementedError() + + +class StructPacker(Packer): + def __init__(self, format, empty_value=_unset): + self.format = format + if empty_value is not _unset: + empty_value = self.pack(empty_value) + else: + empty_value = None + + Packer.__init__(self, struct.calcsize(self.format), + empty_value=empty_value) + + def pack(self, v): + return struct.pack(self.format, v) + + def unpack(self, v): + return struct.unpack(self.format, v)[0] + + +class UnsignedIntegerPacker(StructPacker): + def __init__(self, empty_value=0): + StructPacker.__init__(self, '=I', empty_value) + + +class SignedIntegerPacker(StructPacker): + def __init__(self, empty_value=0): + StructPacker.__init__(self, '=i', empty_value) + + +class FileOffsetPacker(Packer): + """A packer suitable for file offsets. + + We store the 5 least significant bytes of the file offset. This is + enough bits to represent 1 TiB. Of course if the computer + doesn't have large file support, only the lowest 31 bits can be + nonzero, and the offsets are limited to 2 GiB.""" + + # Convert file offsets to 8-bit little-endian unsigned longs... + INDEX_FORMAT = '<Q' + # ...but then truncate to 5 bytes. + INDEX_FORMAT_LEN = 5 + + PAD = '\0' * (struct.calcsize(INDEX_FORMAT) - INDEX_FORMAT_LEN) + + def __init__(self): + Packer.__init__(self, self.INDEX_FORMAT_LEN) + + def pack(self, v): + return struct.pack(self.INDEX_FORMAT, v)[:self.INDEX_FORMAT_LEN] + + def unpack(self, s): + return struct.unpack(self.INDEX_FORMAT, s + self.PAD)[0] + + +class RecordTableAccessError(RuntimeError): + pass + + +class RecordTable: + # The approximate amount of memory that should be used for the cache + # for each instance of this class: + CACHE_MEMORY = 4 * 1024 * 1024 + + # Empirically, each entry in the cache table has an overhead of + # about 96 bytes on a 32-bit computer. + CACHE_OVERHEAD_PER_ENTRY = 96 + + def __init__(self, filename, mode, packer, cache_memory=CACHE_MEMORY): + self.filename = filename + self.mode = mode + if self.mode == DB_OPEN_NEW: + self.f = open(self.filename, 'wb+') + elif self.mode == DB_OPEN_WRITE: + self.f = open(self.filename, 'rb+') + elif self.mode == DB_OPEN_READ: + self.f = open(self.filename, 'rb') + else: + raise RuntimeError('Invalid mode %r' % self.mode) + self.packer = packer + self.cache_memory = cache_memory + + # Number of items that can be stored in the write cache. + self._max_memory_cache = ( + self.cache_memory + / (self.CACHE_OVERHEAD_PER_ENTRY + self.packer.record_len)) + + # Read and write cache; a map {i : (dirty, s)}, where i is an + # index, dirty indicates whether the value has to be written to + # disk, and s is the packed value for the index. Up to + # self._max_memory_cache items can be stored here. When the cache + # fills up, it is written to disk in one go and then cleared. + self._cache = {} + + # The index just beyond the last record ever written: + self._limit = os.path.getsize(self.filename) // self.packer.record_len + + # The index just beyond the last record ever written to disk: + self._limit_written = self._limit + + def __str__(self): + return 'RecordTable(%r)' % (self.filename,) def flush(self): - pairs = self.cache.items() + Log().debug('Flushing cache for %s' % (self,)) + + pairs = [(i, s) for (i, (dirty, s)) in self._cache.items() if dirty] + + if pairs: pairs.sort() old_i = None f = self.f - for (i, v) in pairs: - if i != old_i: - f.seek(i * self.record_len) - f.write(self.pack(v)) + for (i, s) in pairs: + if i == old_i: + # No seeking needed + pass + elif i <= self._limit_written: + # Just jump there: + f.seek(i * self.packer.record_len) + else: + # Jump to the end of the file then write _empty_values until + # we reach the correct location: + f.seek(self._limit_written * self.packer.record_len) + while self._limit_written < i: + f.write(self.packer.empty_value) + self._limit_written += 1 + f.write(s) old_i = i + 1 - self.cache.clear() + self._limit_written = max(self._limit_written, old_i) - def pack(self, v): - """Pack record v into a string of length self.record_len.""" + self.f.flush() - raise NotImplementedError() + self._cache.clear() + + def _set_packed_record(self, i, s): + """Set the value for index I to the packed value S.""" + + if self.mode == DB_OPEN_READ: + raise RecordTableAccessError() + if i < 0: + raise KeyError() + self._cache[i] = (True, s) + if len(self._cache) >= self._max_memory_cache: + self.flush() + self._limit = max(self._limit, i + 1) def __setitem__(self, i, v): - self.cache[i] = v - if len(self.cache) >= self.max_memory_cache: + self._set_packed_record(i, self.packer.pack(v)) + + def __getitem__(self, i): + """Return the item for index I. + + Raise KeyError if that item has never been set (or if it was set + to self.packer.empty_value).""" + + try: + s = self._cache[i][1] + except KeyError: + if not 0 <= i < self._limit_written: + raise KeyError(i) + self.f.seek(i * self.packer.record_len) + s = self.f.read(self.packer.record_len) + self._cache[i] = (False, s) + if len(self._cache) >= self._max_memory_cache: self.flush() + if s == self.packer.empty_value: + raise KeyError(i) + + return self.packer.unpack(s) + + def get_many(self, indexes): + """Generate the items for the specified INDEXES in arbitrary order.""" + + indexes = list(indexes) + # Sort the indexes to reduce disk seeking: + indexes.sort() + for i in indexes: + yield self[i] + + def get(self, i, default=None): + try: + return self[i] + except KeyError: + return default + + def __delitem__(self, i): + """Delete the item for index I. + + Raise KeyError if that item has never been set (or if it was set + to self.packer.empty_value).""" + + if self.mode == DB_OPEN_READ: + raise RecordTableAccessError() + + # Check that the value was set (otherwise raise KeyError): + self[i] + self._set_packed_record(i, self.packer.empty_value) + + def iterkeys(self): + """Return the keys in the map in key order.""" + + for i in xrange(0, self._limit): + try: + self[i] + yield i + except KeyError: + pass + + def itervalues(self): + """Yield the values in the map in key order. + + Skip over values that haven't been defined.""" + + for i in xrange(0, self._limit): + try: + yield self[i] + except KeyError: + pass + def close(self): self.flush() + self._cache = None self.f.close() + self.f = None -class OldRecordTable: - def __init__(self, filename, record_len): - self.f = open(filename, 'rb') - self.record_len = record_len - self.limit = os.path.getsize(filename) // self.record_len +class MmapRecordTable: + GROWTH_INCREMENT = 65536 - def unpack(self, s): - """Unpack string S into a record.""" + def __init__(self, filename, mode, packer): + self.filename = filename + self.mode = mode + self.packer = packer + if self.mode == DB_OPEN_NEW: + self.python_file = open(self.filename, 'wb+') + self.python_file.write('\0' * self.GROWTH_INCREMENT) + self.python_file.flush() + self._filesize = self.GROWTH_INCREMENT + self.f = mmap.mmap( + self.python_file.fileno(), self._filesize, + access=mmap.ACCESS_WRITE + ) + + # The index just beyond the last record ever written: + self._limit = 0 + elif self.mode == DB_OPEN_WRITE: + self.python_file = open(self.filename, 'rb+') + self._filesize = os.path.getsize(self.filename) + self.f = mmap.mmap( + self.python_file.fileno(), self._filesize, + access=mmap.ACCESS_WRITE + ) + + # The index just beyond the last record ever written: + self._limit = os.path.getsize(self.filename) // self.packer.record_len + elif self.mode == DB_OPEN_READ: + self.python_file = open(self.filename, 'rb') + self._filesize = os.path.getsize(self.filename) + self.f = mmap.mmap( + self.python_file.fileno(), self._filesize, + access=mmap.ACCESS_READ + ) + + # The index just beyond the last record ever written: + self._limit = os.path.getsize(self.filename) // self.packer.record_len + else: + raise RuntimeError('Invalid mode %r' % self.mode) - raise NotImplementedError() + def __str__(self): + return 'MmapRecordTable(%r)' % (self.filename,) + + def flush(self): + self.f.flush() + + def _set_packed_record(self, i, s): + """Set the value for index I to the packed value S.""" + + if self.mode == DB_OPEN_READ: + raise RecordTableAccessError() + if i < 0: + raise KeyError() + if i >= self._limit: + # This write extends the range of valid indices. First check + # whether the file has to be enlarged: + new_size = (i + 1) * self.packer.record_len + if new_size > self._filesize: + self._filesize = ( + (new_size + self.GROWTH_INCREMENT - 1) + // self.GROWTH_INCREMENT + * self.GROWTH_INCREMENT + ) + self.f.resize(self._filesize) + # Now pad up to the new record with empty_value, then write record: + self.f.seek(self._limit * self.packer.record_len) + if i > self._limit: + self.f.write(self.packer.empty_value * (i - self._limit)) + self.f.write(s) + self._limit = i + 1 + else: + self.f.seek(i * self.packer.record_len) + self.f.write(s) + + def __setitem__(self, i, v): + self._set_packed_record(i, self.packer.pack(v)) def __getitem__(self, i): - if not 0 <= i < self.limit: + """Return the item for index I. + + Raise KeyError if that item has never been set (or if it was set + to self.packer.empty_value).""" + + if not 0 <= i < self._limit: raise KeyError(i) - self.f.seek(i * self.record_len) - s = self.f.read(self.record_len) - return self.unpack(s) + self.f.seek(i * self.packer.record_len) + s = self.f.read(self.packer.record_len) + + if s == self.packer.empty_value: + raise KeyError(i) + + return self.packer.unpack(s) - def __iter__(self): - for i in xrange(0, self.limit): + def get(self, i, default=None): + try: + return self[i] + except KeyError: + return default + + def __delitem__(self, i): + """Delete the item for index I. + + Raise KeyError if that item has never been set (or if it was set + to self.packer.empty_value).""" + + if self.mode == DB_OPEN_READ: + raise RecordTableAccessError() + + # Check that the value was set (otherwise raise KeyError): + self[i] + self._set_packed_record(i, self.packer.empty_value) + + def iterkeys(self): + """Yield the keys in the map in order.""" + + for i in xrange(0, self._limit): + try: + self[i] + yield i + except KeyError: + pass + + def itervalues(self): + """Yield the values in the map in key order. + + Skip over values that haven't been defined.""" + + for i in xrange(0, self._limit): + try: yield self[i] + except KeyError: + pass def close(self): + self.flush() self.f.close() + self.python_file.close() diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/revision_excluder.py cvs2svn-2.0.0/cvs2svn_lib/revision_excluder.py --- cvs2svn-1.5.x/cvs2svn_lib/revision_excluder.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/revision_excluder.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,93 @@ +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2006-2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""This module provides an interface for being informed about exclusions. + +Currently, revisions can be excluded one branch at a time via the +--exclude option. This interface can be used to inform revision +recorders about branches that are being excluded. (The recorder might +use that information to reduce the amount of temporary data that is +collected.)""" + + +class RevisionExcluder: + """An interface for being informed about excluded revisions.""" + + def __init__(self): + """Initialize the RevisionExcluder. + + Please note that a RevisionExcluder is instantiated in every + program run, even if the branch-exclusion pass will not be + executed. (This is to allow its register_artifacts() method to be + called.) Therefore, the __init__() method should not do much, and + more substantial preparation for use (like actually creating the + artifacts) should be done in start().""" + + pass + + def register_artifacts(self, which_pass): + """Register artifacts that will be needed during branch exclusion. + + WHICH_PASS is the pass that will call our callbacks, so it should + be used to do the registering (e.g., call + WHICH_PASS.register_temp_file() and/or + WHICH_PASS.register_temp_file_needed()).""" + + raise NotImplementedError() + + def start(self): + """Prepare to handle branch exclusions.""" + + raise NotImplementedError() + + def process_file(self, cvs_file_items): + """Called for files whose trees were modified in FilterSymbolsPass. + + This callback is called once for each CVSFile whose topology was + modified in FilterSymbolsPass.""" + + raise NotImplementedError() + + def skip_file(self, cvs_file): + """Called when a file's dependency topology didn't have to be changed.""" + + raise NotImplementedError() + + def finish(self): + """Called after all branch exclusions for all files are done.""" + + raise NotImplementedError() + + +class NullRevisionExcluder(RevisionExcluder): + """A do-nothing variety of RevisionExcluder.""" + + def register_artifacts(self, which_pass): + pass + + def start(self): + pass + + def process_file(self, cvs_file_items): + pass + + def skip_file(self, cvs_file): + pass + + def finish(self): + pass + + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/revision_reader.py cvs2svn-2.0.0/cvs2svn_lib/revision_reader.py --- cvs2svn-1.5.x/cvs2svn_lib/revision_reader.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/revision_reader.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,207 @@ +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2000-2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""This module provides access to the CVS repository for cvs2svn.""" + + +import os + +from cvs2svn_lib.boolean import * +from cvs2svn_lib.common import FatalError +from cvs2svn_lib.common import CommandError +from cvs2svn_lib.context import Ctx +from cvs2svn_lib.process import check_command_runs +from cvs2svn_lib.process import SimplePopen +from cvs2svn_lib.process import CommandFailedException +from cvs2svn_lib.revision_recorder import NullRevisionRecorder +from cvs2svn_lib.revision_excluder import NullRevisionExcluder + + +class PipeStream(object): + """A file-like object from which revision contents can be read.""" + + def __init__(self, pipe_command): + self.pipe_command = ' '.join(pipe_command) + self.pipe = SimplePopen(pipe_command, True) + self.pipe.stdin.close() + + def read(self, size=None): + if size is None: + return self.pipe.stdout.read() + else: + return self.pipe.stdout.read(size) + + def close(self): + self.pipe.stdout.close() + error_output = self.pipe.stderr.read() + exit_status = self.pipe.wait() + if exit_status: + raise CommandError(self.pipe_command, exit_status, error_output) + + +class RevisionReader(object): + """An object that can read the contents of CVSRevisions.""" + + def register_artifacts(self, which_pass): + """Register artifacts that will be needed during branch exclusion. + + WHICH_PASS is the pass that will call our callbacks, so it should + be used to do the registering (e.g., call + WHICH_PASS.register_temp_file() and/or + WHICH_PASS.register_temp_file_needed()).""" + + raise NotImplementedError() + + def get_revision_recorder(self): + """Return a RevisionRecorder instance that can gather revision info. + + The object returned by this method will be passed to CollectData, + and its callback methods called as the CVS files are parsed. If + no data collection is necessary, this method can return an + instance of NullRevisionRecorder.""" + + raise NotImplementedError + + def get_revision_excluder(self): + """Return a RevisionExcluder instance to collect exclusion info. + + The object returned by this method will have its callback methods + called as branches are excluded. If such information is not + needed, this method can return an instance of + NullRevisionExcluder.""" + + raise NotImplementedError + + def start(self): + """Prepare for calls to get_content_stream.""" + + raise NotImplementedError + + def get_content_stream(self, cvs_rev, suppress_keyword_substitution=False): + """Return a file-like object from which the contents of CVS_REV + can be read. + + CVS_REV is a CVSRevision. If SUPPRESS_KEYWORD_SUBSTITUTION is + True, then suppress the substitution of RCS/CVS keywords in the + output.""" + + raise NotImplementedError + + def skip_content(self, cvs_rev): + """Inform the reader that CVS_REV would be fetched now, but isn't + actually needed. + + This may be used for internal housekeeping. + Note that this is not called for CVSRevisionDelete revisions.""" + + raise NotImplementedError + + def finish(self): + """Inform the reader that all calls to get_content_stream are done. + Start may be called again at a later point.""" + + raise NotImplementedError + + +class RCSRevisionReader(RevisionReader): + """A RevisionReader that reads the contents via RCS.""" + + def __init__(self, co_executable): + self.co_executable = co_executable + try: + check_command_runs([self.co_executable, '-V'], self.co_executable) + except CommandFailedException, e: + raise FatalError('%s\n' + 'Please check that co is installed and in your PATH\n' + '(it is a part of the RCS software).' % (e,)) + + def register_artifacts(self, which_pass): + pass + + def get_revision_recorder(self): + return NullRevisionRecorder() + + def get_revision_excluder(self): + return NullRevisionExcluder() + + def start(self): + pass + + def get_content_stream(self, cvs_rev, suppress_keyword_substitution=False): + pipe_cmd = [self.co_executable, '-q', '-x,v', '-p' + cvs_rev.rev] + if suppress_keyword_substitution: + pipe_cmd.append('-kk') + pipe_cmd.append(cvs_rev.cvs_file.filename) + return PipeStream(pipe_cmd) + + def skip_content(self, cvs_rev): + pass + + def finish(self): + pass + + +class CVSRevisionReader(RevisionReader): + """A RevisionReader that reads the contents via CVS.""" + + def __init__(self, cvs_executable): + self.cvs_executable = cvs_executable + + def cvs_ok(global_arguments): + check_command_runs( + [self.cvs_executable] + global_arguments + ['--version'], + self.cvs_executable) + + self.global_arguments = [ "-q", "-R" ] + try: + cvs_ok(self.global_arguments) + except CommandFailedException, e: + self.global_arguments = [ "-q" ] + try: + cvs_ok(self.global_arguments) + except CommandFailedException, e: + raise FatalError( + '%s\n' + 'Please check that cvs is installed and in your PATH.' % (e,)) + + def register_artifacts(self, which_pass): + pass + + def get_revision_recorder(self): + return NullRevisionRecorder() + + def get_revision_excluder(self): + return NullRevisionExcluder() + + def start(self): + pass + + def get_content_stream(self, cvs_rev, suppress_keyword_substitution=False): + project = cvs_rev.cvs_file.project + pipe_cmd = [self.cvs_executable] + self.global_arguments + \ + ['-d', project.cvs_repository_root, + 'co', '-r' + cvs_rev.rev, '-p'] + if suppress_keyword_substitution: + pipe_cmd.append('-kk') + pipe_cmd.append(project.cvs_module + cvs_rev.cvs_path) + return PipeStream(pipe_cmd) + + def skip_content(self, cvs_rev): + pass + + def finish(self): + pass + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/revision_recorder.py cvs2svn-2.0.0/cvs2svn_lib/revision_recorder.py --- cvs2svn-1.5.x/cvs2svn_lib/revision_recorder.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/revision_recorder.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,108 @@ +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2006-2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""This module provides objects that can record CVS revision contents.""" + + +class RevisionRecorder: + """An object that can record text and deltas from CVS files.""" + + def __init__(self): + """Initialize the RevisionRecorder. + + Please note that a RevisionRecorder is instantiated in every + program run, even if the data-collection pass will not be + executed. (This is to allow it to register the artifacts that it + produces.) Therefore, the __init__() method should not do much, + and more substantial preparation for use (like actually creating + the artifacts) should be done in start().""" + + pass + + def register_artifacts(self, which_pass): + """Register artifacts that will be needed during data recording. + + WHICH_PASS is the pass that will call our callbacks, so it should + be used to do the registering (e.g., call + WHICH_PASS.register_temp_file() and/or + WHICH_PASS.register_temp_file_needed()).""" + + raise NotImplementedError() + + def start(self): + """Data will soon start being collected. + + Any non-idempotent initialization should be done here.""" + + raise NotImplementedError() + + def start_file(self, cvs_file): + """Prepare to receive data for the specified file. + + CVS_FILE is an instance of CVSFile.""" + + raise NotImplementedError() + + def record_text(self, revisions_data, revision, log, text): + """Record information about a revision and optionally return a token. + + REVISIONS_DATA is a map { rev : _RevisionData } containing + collect_data._RevisionData instances for all revisions in this + file. REVISION is the revision number of the current revision. + LOG and TEXT are the log message and text (as retrieved from the + RCS file) for that revision. (TEXT is full text for the HEAD + revision, and deltas for other revisions.)""" + + raise NotImplementedError() + + def finish_file(self, cvs_file_items): + """The current file is finished; finish and clean up. + + REVISIONS_DATA is a map { rev : _RevisionData } containing + _RevisionData instances for all revisions in this file. ROOT_REV + is the revision number of the revision that is the root of the + dependency tree (usually '1.1').""" + + raise NotImplementedError() + + def finish(self): + """All recording is done; clean up.""" + + raise NotImplementedError() + + +class NullRevisionRecorder(RevisionRecorder): + """A do-nothing variety of RevisionRecorder.""" + + def register_artifacts(self, which_pass): + pass + + def start(self): + pass + + def start_file(self, cvs_file): + pass + + def record_text(self, revisions_data, revision, log, text): + return None + + def finish_file(self, cvs_file_items): + pass + + def finish(self): + pass + + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/run_options.py cvs2svn-2.0.0/cvs2svn_lib/run_options.py --- cvs2svn-1.5.x/cvs2svn_lib/run_options.py 2006-12-20 13:43:46.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/run_options.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -23,16 +23,19 @@ import sys import os import re import getopt +import time try: my_getopt = getopt.gnu_getopt except AttributeError: my_getopt = getopt.getopt from cvs2svn_lib.boolean import * +from cvs2svn_lib.version import VERSION from cvs2svn_lib import config from cvs2svn_lib.common import warning_prefix from cvs2svn_lib.common import error_prefix from cvs2svn_lib.common import FatalError +from cvs2svn_lib.common import UTF8Encoder from cvs2svn_lib.log import Log from cvs2svn_lib.context import Ctx from cvs2svn_lib.output_option import DumpfileOutputOption @@ -40,6 +43,9 @@ from cvs2svn_lib.output_option import Ne from cvs2svn_lib.output_option import ExistingRepositoryOutputOption from cvs2svn_lib.project import Project from cvs2svn_lib.pass_manager import InvalidPassError +from cvs2svn_lib.revision_reader import RCSRevisionReader +from cvs2svn_lib.revision_reader import CVSRevisionReader +from cvs2svn_lib.checkout_internal import InternalRevisionReader from cvs2svn_lib.symbol_strategy import AllBranchRule from cvs2svn_lib.symbol_strategy import AllTagRule from cvs2svn_lib.symbol_strategy import BranchIfCommitsRule @@ -51,124 +57,176 @@ from cvs2svn_lib.symbol_strategy import from cvs2svn_lib.symbol_strategy import UnambiguousUsageRule from cvs2svn_lib.symbol_transform import RegexpSymbolTransform from cvs2svn_lib.property_setters import AutoPropsPropertySetter -from cvs2svn_lib.property_setters import BinaryFileDefaultMimeTypeSetter -from cvs2svn_lib.property_setters import BinaryFileEOLStyleSetter +from cvs2svn_lib.property_setters import CVSBinaryFileDefaultMimeTypeSetter +from cvs2svn_lib.property_setters import CVSBinaryFileEOLStyleSetter from cvs2svn_lib.property_setters import CVSRevisionNumberSetter from cvs2svn_lib.property_setters import DefaultEOLStyleSetter from cvs2svn_lib.property_setters import EOLStyleFromMimeTypeSetter from cvs2svn_lib.property_setters import ExecutablePropertySetter from cvs2svn_lib.property_setters import KeywordsPropertySetter from cvs2svn_lib.property_setters import MimeMapper +from cvs2svn_lib.property_setters import SVNBinaryFileKeywordsPropertySetter usage_message_template = """\ -USAGE: %(progname)s [-v] [-s svn-repos-path] [-p pass] cvs-repos-path - --help, -h print this usage message and exit with success - --help-passes list the available passes and their numbers - --version print the version number - --verbose, -v verbose - --quiet, -q quiet - --options=PATH read the conversion options from the specified path - -s PATH path for SVN repos - -p PASS execute only specified PASS - -p [START]:[END] execute passes START through END, inclusive - (PASS, START, and END can be pass names or numbers) - --existing-svnrepos load into existing SVN repository +Usage: %(progname)s --options OPTIONFILE + %(progname)s [OPTION...] OUTPUT-OPTION CVS-REPOS-PATH +%(progname)s converts a CVS repository into a Subversion repository, including +history. + + Configuration via options file: + + --options=PATH read the conversion options from PATH. This + method allows more flexibility than using + command-line options. See documentation for info + + Output options: + + -s, --svnrepos=PATH path where SVN repos should be created + --existing-svnrepos load into existing SVN repository (for use with + --svnrepos) + --fs-type=TYPE pass --fs-type=TYPE to "svnadmin create" (for use + with --svnrepos) + --bdb-txn-nosync pass --bdb-txn-nosync to "svnadmin create" (for + use with --svnrepos) --dumpfile=PATH just produce a dumpfile; don't commit to a repos --dry-run do not create a repository or a dumpfile; just print what would happen. - --use-cvs use CVS instead of RCS 'co' to extract data - (only use this if having problems with RCS) + + Conversion options: + --trunk-only convert only trunk commits, not tags nor branches --trunk=PATH path for trunk (default: %(trunk_base)s) --branches=PATH path for branches (default: %(branches_base)s) --tags=PATH path for tags (default: %(tags_base)s) --no-prune don't prune empty directories --encoding=ENC encoding for paths and log messages in CVS repos. - If option is specified multiple times, the encoders - will be tried in order until one succeeds. - --fallback-encoding=ENC If all --encodings fail, use lossy encoding with ENC + If option is specified multiple times, encoders + are tried in order until one succeeds. See + http://docs.python.org/lib/standard-encodings.html + for a list of standard Python encodings. + --fallback-encoding=ENC If all --encodings fail, use lossy encoding + with ENC + --symbol-transform=P:S transform symbol names from P to S, where P and S + use Python regexp and reference syntax + respectively. P must match the whole symbol name --force-branch=REGEXP force symbols matching REGEXP to be branches --force-tag=REGEXP force symbols matching REGEXP to be tags --exclude=REGEXP exclude branches and tags matching REGEXP - --symbol-default=OPT choose how ambiguous symbols are converted. OPT is - "branch", "tag", or "heuristic", or "strict" (default) - --symbol-transform=P:S transform symbol names from P to S where P and S - use Python regexp and reference syntax respectively + --symbol-default=OPT specify how ambiguous symbols are converted. + OPT is "branch", "tag", "heuristic", or + "strict" (default) + --no-cross-branch-commits Prevent the creation of cross-branch commits + --retain-conflicting-attic-files if a file appears both in and out of + the CVS Attic, then leave the attic version in a + SVN directory called "Attic". --username=NAME username for cvs2svn-synthesized commits - --fs-type=TYPE pass --fs-type=TYPE to "svnadmin create" - --bdb-txn-nosync pass --bdb-txn-nosync to "svnadmin create" --cvs-revnums record CVS revision numbers as file properties --mime-types=FILE specify an apache-style mime.types file for setting svn:mime-type + --eol-from-mime-type set svn:eol-style from mime type if known --auto-props=FILE set file properties from the auto-props section of a file in svn config format - --auto-props-ignore-case Ignore case when matching auto-props patterns - --eol-from-mime-type set svn:eol-style from mime type if known - --no-default-eol don't set svn:eol-style to 'native' for - non-binary files with undetermined mime types + --default-eol=VALUE default svn:eol-style for non-binary files with + undetermined mime types. VALUE is "binary" + (default), "native", "CRLF", "LF", or "CR" --keywords-off don't set svn:keywords on any files (by default, cvs2svn sets svn:keywords on non-binary files to "%(svn_keywords_value)s") - --tmpdir=PATH directory to use for tmp data (default to cwd) - --skip-cleanup prevent the deletion of intermediate files - --profile profile with 'hotshot' (into file cvs2svn.hotshot) + + Extraction options: + + --use-rcs use RCS to extract revision contents (default) + --use-cvs use CVS to extract revision contents + (only use this if having problems with RCS) + --use-internal-co use internal code to extract revision contents + (very fast but disk space intensive) + + Environment options: + + --tmpdir=PATH directory to use for temporary data files + (default "cvs2svn-tmp") --svnadmin=PATH path to the "svnadmin" program - --co=PATH path to the "co" program (required if not --use-cvs) + --co=PATH path to the "co" program (required if --use-rcs) --cvs=PATH path to the "cvs" program (required if --use-cvs) --sort=PATH path to the GNU "sort" program -""" -def usage(): - sys.stdout.write(usage_message_template % { - 'progname' : os.path.basename(sys.argv[0]), - 'trunk_base' : config.DEFAULT_TRUNK_BASE, - 'branches_base' : config.DEFAULT_BRANCHES_BASE, - 'tags_base' : config.DEFAULT_TAGS_BASE, - 'svn_keywords_value' : config.SVN_KEYWORDS_VALUE, - }) + Partial conversions: + + -p, --pass PASS execute only specified PASS of conversion + -p, --passes [START]:[END] execute passes START through END, inclusive (PASS, + START, and END can be pass names or numbers) + Information options: + + --version print the version number + -h, --help print this usage message and exit with success + --help-passes list the available passes and their numbers + -v, --verbose verbose (may be specified twice for debug output) + -q, --quiet quiet (may be specified twice for very quiet) + --skip-cleanup prevent the deletion of intermediate files + --profile profile with 'hotshot' (into file cvs2svn.hotshot) +""" class RunOptions: """A place to store meta-options that are used to start the conversion.""" - def __init__(self, pass_manager): - """Process the command-line options, storing run options to SELF.""" + def __init__(self, progname, cmd_args, pass_manager): + """Process the command-line options, storing run options to SELF. + + PROGNAME is the name of the program, used in the usage string. + CMD_ARGS is the list of command-line arguments passed to the + program. PASS_MANAGER is an instance of PassManager, needed to + help process the -p and --help-passes options.""" self.pass_manager = pass_manager self.start_pass = 1 self.end_pass = self.pass_manager.num_passes self.profiling = False + self.progname = progname try: - self.opts, self.args = my_getopt(sys.argv[1:], 'hvqs:p:', [ - "help", "help-passes", "version", - "verbose", "quiet", - "existing-svnrepos", "dumpfile=", "dry-run", - "use-cvs", + self.opts, self.args = my_getopt(cmd_args, 'hvqs:p:', [ + "options=", + + "svnrepos=", "existing-svnrepos", "fs-type=", "bdb-txn-nosync", + "dumpfile=", + "dry-run", + "trunk-only", "trunk=", "branches=", "tags=", "no-prune", "encoding=", "fallback-encoding=", - "force-branch=", "force-tag=", "exclude=", "symbol-default=", "symbol-transform=", + "force-branch=", "force-tag=", "exclude=", "symbol-default=", + "no-cross-branch-commits", + "retain-conflicting-attic-files", "username=", - "fs-type=", "bdb-txn-nosync", "cvs-revnums", "mime-types=", - "auto-props=", "auto-props-ignore-case", - "eol-from-mime-type", "no-default-eol", + "auto-props=", + "eol-from-mime-type", "default-eol=", "keywords-off", + + "use-rcs", "use-cvs", "use-internal-co", + "tmpdir=", + "svnadmin=", "co=", "cvs=", "sort=", + + "pass=", "passes=", + + "version", "help", "help-passes", + "verbose", "quiet", "skip-cleanup", "profile", - "svnadmin=", "co=", "cvs=", "sort=", - "dump-only", "create", - "options=", + + # These options are deprecated and are only included for + # backwards compatibility: + "dump-only", "create", "no-default-eol", "auto-props-ignore-case", ]) except getopt.GetoptError, e: sys.stderr.write(error_prefix + ': ' + str(e) + '\n\n') - usage() + self.usage() sys.exit(1) # First look for any 'help'-type options, as they just cause the @@ -187,6 +245,14 @@ class RunOptions: # --options: self.process_common_options() + # Now the log level has been set; log the time when the run started: + Log().verbose( + time.strftime( + 'Conversion start time: %Y-%m-%d %I:%M:%S %Z', + time.localtime(Log().start_time) + ) + ) + if options_file_found: # All of the options that are compatible with --options have # been consumed above. It is an error if any other options or @@ -204,13 +270,13 @@ class RunOptions: """Process any help-type options.""" if self.get_options('-h', '--help'): - usage() + self.usage() sys.exit(0) elif self.get_options('--help-passes'): self.pass_manager.help_passes() sys.exit(0) elif self.get_options('--version'): - print '%s version %s' % (os.path.basename(sys.argv[0]), Ctx().VERSION) + print '%s version %s' % (os.path.basename(self.progname), VERSION) sys.exit(0) def process_common_options(self): @@ -223,7 +289,7 @@ class RunOptions: for (opt, value) in self.get_options('--quiet', '-q'): Log().decrease_verbosity() - for (opt, value) in self.get_options('-p'): + for (opt, value) in self.get_options('--pass', '--passes', '-p'): if value.find(':') >= 0: start_pass, end_pass = value.split(':') self.start_pass = self.pass_manager.get_pass_number( @@ -253,29 +319,42 @@ class RunOptions: bdb_txn_nosync = False dump_only = False dumpfile = None + use_rcs = False + use_cvs = False + use_internal_co = False symbol_strategy_default = 'strict' mime_types_file = None auto_props_file = None - auto_props_ignore_case = False + auto_props_ignore_case = True eol_from_mime_type = False - no_default_eol = False + default_eol = None keywords_off = False + co_executable = config.CO_EXECUTABLE + cvs_executable = config.CVS_EXECUTABLE trunk_base = config.DEFAULT_TRUNK_BASE branches_base = config.DEFAULT_BRANCHES_BASE tags_base = config.DEFAULT_TAGS_BASE + encodings = ['ascii'] + fallback_encoding = None + force_branch = False + force_tag = False symbol_transforms = [] ctx.symbol_strategy = RuleBasedSymbolStrategy() for opt, value in self.opts: - if opt == '-s': + if opt in ['-s', '--svnrepos']: target = value elif opt == '--existing-svnrepos': existing_svnrepos = True elif opt == '--dumpfile': dumpfile = value + elif opt == '--use-rcs': + use_rcs = True elif opt == '--use-cvs': - ctx.use_cvs = True + use_cvs = True + elif opt == '--use-internal-co': + use_internal_co = True elif opt == '--trunk-only': ctx.trunk_only = True elif opt == '--trunk': @@ -287,13 +366,15 @@ class RunOptions: elif opt == '--no-prune': ctx.prune = False elif opt == '--encoding': - ctx.encoding.insert(-1, value) + encodings.insert(-1, value) elif opt == '--fallback-encoding': - ctx.fallback_encoding = value + fallback_encoding = value elif opt == '--force-branch': ctx.symbol_strategy.add_rule(ForceBranchRegexpStrategyRule(value)) + force_branch = True elif opt == '--force-tag': ctx.symbol_strategy.add_rule(ForceTagRegexpStrategyRule(value)) + force_tag = True elif opt == '--exclude': ctx.symbol_strategy.add_rule(ExcludeRegexpStrategyRule(value)) elif opt == '--symbol-default': @@ -301,12 +382,16 @@ class RunOptions: raise FatalError( '%r is not a valid option for --symbol_default.' % (value,)) symbol_strategy_default = value + elif opt == '--no-cross-branch-commits': + ctx.cross_branch_commits = False + elif opt == '--retain-conflicting-attic-files': + ctx.retain_conflicting_attic_files = True elif opt == '--symbol-transform': [pattern, replacement] = value.split(":") try: symbol_transforms.append( RegexpSymbolTransform(pattern, replacement)) - except re.error, e: + except re.error: raise FatalError("'%s' is not a valid regexp." % (pattern,)) elif opt == '--username': ctx.username = value @@ -321,11 +406,25 @@ class RunOptions: elif opt == '--auto-props': auto_props_file = value elif opt == '--auto-props-ignore-case': + # "ignore case" is now the default, so this option doesn't + # affect anything. auto_props_ignore_case = True elif opt == '--eol-from-mime-type': eol_from_mime_type = True + elif opt == '--default-eol': + try: + # Check that value is valid, and translate it to the proper case + default_eol = { + 'binary' : None, 'native' : 'native', + 'crlf' : 'CRLF', 'lf' : 'LF', 'cr' : 'CR', + }[value.lower()] + except KeyError: + raise FatalError( + 'Illegal value specified for --default-eol: %s' % (value,) + ) elif opt == '--no-default-eol': - no_default_eol = True + # For backwards compatibility: + default_eol = None elif opt == '--keywords-off': keywords_off = True elif opt == '--tmpdir': @@ -335,9 +434,9 @@ class RunOptions: elif opt == '--svnadmin': ctx.svnadmin_executable = value elif opt == '--co': - ctx.co_executable = value + co_executable = value elif opt == '--cvs': - ctx.cvs_executable = value + cvs_executable = value elif opt == '--sort': ctx.sort_executable = value elif opt == '--dump-only': @@ -352,13 +451,13 @@ class RunOptions: # Consistency check for options and arguments. if len(self.args) == 0: - usage() + self.usage() sys.exit(1) if len(self.args) > 1: sys.stderr.write(error_prefix + ": must pass only one CVS repository.\n") - usage() + self.usage() sys.exit(1) cvsroot = self.args[0] @@ -389,6 +488,21 @@ class RunOptions: not_both(fs_type, '--fs-type', existing_svnrepos, '--existing-svnrepos') + not_both(use_rcs, '--use-rcs', + use_cvs, '--use-cvs') + + not_both(use_rcs, '--use-rcs', + use_internal_co, '--use-internal-co') + + not_both(use_cvs, '--use-cvs', + use_internal_co, '--use-internal-co') + + not_both(ctx.trunk_only, '--trunk-only', + force_branch, '--force-branch') + + not_both(ctx.trunk_only, '--trunk-only', + force_tag, '--force-tag') + if fs_type and fs_type != 'bdb' and bdb_txn_nosync: raise FatalError("cannot pass --bdb-txn-nosync with --fs-type=%s." % fs_type) @@ -402,12 +516,27 @@ class RunOptions: else: ctx.output_option = DumpfileOutputOption(dumpfile) + if use_rcs: + ctx.revision_reader = RCSRevisionReader(co_executable) + elif use_cvs: + ctx.revision_reader = CVSRevisionReader(cvs_executable) + else: + # --use-internal-co is the default: + ctx.revision_reader = InternalRevisionReader(compress=True) + # Create the default project (using ctx.trunk, ctx.branches, and # ctx.tags): ctx.add_project(Project( cvsroot, trunk_base, branches_base, tags_base, symbol_transforms=symbol_transforms)) + try: + ctx.utf8_encoder = UTF8Encoder(encodings, fallback_encoding) + # Don't use fallback_encoding for filenames: + ctx.filename_utf8_encoder = UTF8Encoder(encodings) + except LookupError, e: + raise FatalError(str(e)) + ctx.symbol_strategy.add_rule(UnambiguousUsageRule()) if symbol_strategy_default == 'strict': pass @@ -421,31 +550,30 @@ class RunOptions: else: assert False - ctx.svn_property_setters.append(ExecutablePropertySetter()) - - ctx.svn_property_setters.append(BinaryFileEOLStyleSetter()) + if auto_props_file: + ctx.svn_property_setters.append(AutoPropsPropertySetter( + auto_props_file, auto_props_ignore_case)) if mime_types_file: ctx.svn_property_setters.append(MimeMapper(mime_types_file)) - if auto_props_file: - ctx.svn_property_setters.append(AutoPropsPropertySetter( - auto_props_file, auto_props_ignore_case)) + ctx.svn_property_setters.append(CVSBinaryFileEOLStyleSetter()) - ctx.svn_property_setters.append(BinaryFileDefaultMimeTypeSetter()) + ctx.svn_property_setters.append(CVSBinaryFileDefaultMimeTypeSetter()) if eol_from_mime_type: ctx.svn_property_setters.append(EOLStyleFromMimeTypeSetter()) - if no_default_eol: - ctx.svn_property_setters.append(DefaultEOLStyleSetter(None)) - else: - ctx.svn_property_setters.append(DefaultEOLStyleSetter('native')) + ctx.svn_property_setters.append(DefaultEOLStyleSetter(default_eol)) + + ctx.svn_property_setters.append(SVNBinaryFileKeywordsPropertySetter()) if not keywords_off: ctx.svn_property_setters.append( KeywordsPropertySetter(config.SVN_KEYWORDS_VALUE)) + ctx.svn_property_setters.append(ExecutablePropertySetter()) + def check_options(self): """Check the the run options are OK. @@ -521,4 +649,13 @@ class RunOptions: } execfile(options_filename, g, l) + def usage(self): + sys.stdout.write(usage_message_template % { + 'progname' : self.progname, + 'trunk_base' : config.DEFAULT_TRUNK_BASE, + 'branches_base' : config.DEFAULT_BRANCHES_BASE, + 'tags_base' : config.DEFAULT_TAGS_BASE, + 'svn_keywords_value' : config.SVN_KEYWORDS_VALUE, + }) + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/serializer.py cvs2svn-2.0.0/cvs2svn_lib/serializer.py --- cvs2svn-1.5.x/cvs2svn_lib/serializer.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/serializer.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,172 @@ +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2000-2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""Picklers and unpicklers that are primed with known objects.""" + +from __future__ import generators + +import cStringIO +import marshal +import cPickle +import zlib + +from cvs2svn_lib.boolean import * + + +class Serializer: + """An object able to serialize/deserialize some class of objects.""" + + def dumpf(self, f, object): + """Serialize OBJECT to file-like object F.""" + + raise NotImplementedError() + + def dumps(self, object): + """Return a string containing OBJECT in serialized form.""" + + raise NotImplementedError() + + def loadf(self, f): + """Return the next object deserialized from file-like object F.""" + + raise NotImplementedError() + + def loads(self, s): + """Return the object deserialized from string S.""" + + raise NotImplementedError() + + +class StringSerializer(Serializer): + """This class serializes/deserializes strings. + + Dumps and loads are simple pass-throughs, while dumpf and loadf use + marshal (so the serialized values know their own length in the file). + As a consequence, the two storage methods must not be mixed.""" + + def dumpf(self, f, object): + marshal.dump(object, f) + + def dumps(self, object): + return object + + def loadf(self, f): + return marshal.load(f) + + def loads(self, s): + return s + + +class MarshalSerializer(Serializer): + """This class uses the marshal module to serialize/deserialize. + + This means that it shares the limitations of the marshal module, + namely only being able to serialize a few simple python data types + without reference loops.""" + + def dumpf(self, f, object): + marshal.dump(object, f) + + def dumps(self, object): + return marshal.dumps(object) + + def loadf(self, f): + return marshal.load(f) + + def loads(self, s): + return marshal.loads(s) + + +class PrimedPickleSerializer(Serializer): + """This class acts as a pickler/unpickler with a pre-initialized memo. + + The picklers and unpicklers are 'pre-trained' to recognize the + objects that are in the primer. If objects are recognized + from PRIMER, then only their persistent IDs need to be pickled + instead of the whole object. (Note that the memos needed for + pickling and unpickling are different.) + + A new pickler/unpickler is created for each use, each time with the + memo initialized appropriately for pickling or unpickling.""" + + def __init__(self, primer): + """Prepare to make picklers/unpicklers with the specified primer. + + The Pickler and Unpickler are 'primed' by pre-pickling PRIMER, + which can be an arbitrary object (e.g., a list of objects that are + expected to occur frequently in the objects to be serialized).""" + + f = cStringIO.StringIO() + pickler = cPickle.Pickler(f, -1) + pickler.dump(primer) + self.pickler_memo = pickler.memo + + unpickler = cPickle.Unpickler(cStringIO.StringIO(f.getvalue())) + unpickler.load() + self.unpickler_memo = unpickler.memo + + def dumpf(self, f, object): + """Serialize OBJECT to file-like object F.""" + + pickler = cPickle.Pickler(f, -1) + pickler.memo = self.pickler_memo.copy() + pickler.dump(object) + + def dumps(self, object): + """Return a string containing OBJECT in serialized form.""" + + f = cStringIO.StringIO() + self.dumpf(f, object) + return f.getvalue() + + def loadf(self, f): + """Return the next object deserialized from file-like object F.""" + + unpickler = cPickle.Unpickler(f) + unpickler.memo = self.unpickler_memo.copy() + return unpickler.load() + + def loads(self, s): + """Return the object deserialized from string S.""" + + return self.loadf(cStringIO.StringIO(s)) + + +class CompressingSerializer(Serializer): + """This class wraps other Serializers to compress their serialized data. + + The bit streams for dumps and loads are different from those of dumpf + and loadf for the reasons explained in StringSerializer.""" + + def __init__(self, wrapee): + """Constructor. WRAPEE is the Serializer whose bitstream ought to be + compressed.""" + + self.wrapee = wrapee + + def dumpf(self, f, object): + marshal.dump(self.dumps(object), f) + + def dumps(self, object): + return zlib.compress(self.wrapee.dumps(object), 9) + + def loadf(self, f): + return self.loads(marshal.load(f)) + + def loads(self, s): + return self.wrapee.loads(zlib.decompress(s)) + + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/set_support.py cvs2svn-2.0.0/cvs2svn_lib/set_support.py --- cvs2svn-1.5.x/cvs2svn_lib/set_support.py 2006-06-11 17:01:43.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/set_support.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2006 CollabNet. All rights reserved. +# Copyright (c) 2006-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -57,9 +57,51 @@ except NameError: def remove(self, value): del self._dict[value] + def discard(self, value): + try: + self.remove(value) + except KeyError: + pass + def pop(self): return self._dict.popitem()[0] + def clear(self): + self._dict.clear() + + def difference(self, other): + retval = set() + for x in self: + if x not in other: + retval.add(x) + + return retval + + def __sub__(self, other): + return self.difference(other) + + def __and__(self, other): + """Set intersection.""" + + if len(self) <= len(other): + s1, s2 = self, other + else: + s1, s2 = other, self + + retval = set() + for x in s1: + if x in s2: + retval.add(x) + return retval + + def update(self, other): + for x in other: + self.add(x) + return self + + def __ior__(self, other): + return self.update(other) + def __repr__(self): return 'Set(%r)' % (self._dict.keys(),) diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/stats_keeper.py cvs2svn-2.0.0/cvs2svn_lib/stats_keeper.py --- cvs2svn-1.5.x/cvs2svn_lib/stats_keeper.py 2006-09-10 16:36:26.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/stats_keeper.py 2007-08-15 22:53:53.000000000 +0200 @@ -27,11 +27,16 @@ from cvs2svn_lib.boolean import * from cvs2svn_lib.set_support import * from cvs2svn_lib import config from cvs2svn_lib.artifact_manager import artifact_manager +from cvs2svn_lib.cvs_item import CVSRevision +from cvs2svn_lib.cvs_item import CVSBranch +from cvs2svn_lib.cvs_item import CVSTag class StatsKeeper: def __init__(self): self._cvs_revs_count = 0 + self._cvs_branches_count = 0 + self._cvs_tags_count = 0 # A set of tag_ids seen: self._tag_ids = set() # A set of branch_ids seen: @@ -47,8 +52,8 @@ class StatsKeeper: self._stats_reflect_exclude = False self._repos_files = set() - def log_duration_for_pass(self, duration, pass_num): - self._pass_timings[pass_num] = duration + def log_duration_for_pass(self, duration, pass_num, pass_name): + self._pass_timings[pass_num] = (pass_name, duration,) def set_start_time(self, start): self._start_time = start @@ -61,6 +66,8 @@ class StatsKeeper: def reset_cvs_rev_info(self): self._cvs_revs_count = 0 + self._cvs_branches_count = 0 + self._cvs_tags_count = 0 self._tag_ids = set() self._branch_ids = set() @@ -72,14 +79,9 @@ class StatsKeeper: self._repos_file_count = len(self._repos_files) - def record_cvs_rev(self, cvs_rev): + def _record_cvs_rev(self, cvs_rev): self._cvs_revs_count += 1 - for tag_id in cvs_rev.tag_ids: - self._tag_ids.add(tag_id) - for branch_id in cvs_rev.branch_ids: - self._branch_ids.add(branch_id) - if cvs_rev.timestamp < self._first_rev_date: self._first_rev_date = cvs_rev.timestamp @@ -88,6 +90,24 @@ class StatsKeeper: self._record_cvs_file(cvs_rev.cvs_file) + def _record_cvs_branch(self, cvs_branch): + self._cvs_branches_count += 1 + self._branch_ids.add(cvs_branch.symbol.id) + + def _record_cvs_tag(self, cvs_tag): + self._cvs_tags_count += 1 + self._tag_ids.add(cvs_tag.symbol.id) + + def record_cvs_item(self, cvs_item): + if isinstance(cvs_item, CVSRevision): + self._record_cvs_rev(cvs_item) + elif isinstance(cvs_item, CVSBranch): + self._record_cvs_branch(cvs_item) + elif isinstance(cvs_item, CVSTag): + self._record_cvs_tag(cvs_item) + else: + raise RuntimeError('Unknown CVSItem type') + def set_svn_rev_count(self, count): self._svn_rev_count = count @@ -121,6 +141,8 @@ class StatsKeeper: '------------------\n' \ 'Total CVS Files: %10i\n' \ 'Total CVS Revisions: %10i\n' \ + 'Total CVS Branches: %10i\n' \ + 'Total CVS Tags: %10i\n' \ 'Total Unique Tags: %10i\n' \ 'Total Unique Branches: %10i\n' \ 'CVS Repos Size in KB: %10i\n' \ @@ -131,6 +153,8 @@ class StatsKeeper: '%s' % (self._repos_file_count, self._cvs_revs_count, + self._cvs_branches_count, + self._cvs_tags_count, len(self._tag_ids), len(self._branch_ids), (self._repos_size / 1024), @@ -143,20 +167,22 @@ class StatsKeeper: def timings(self): passes = self._pass_timings.keys() passes.sort() - output = 'Timings:\n------------------\n' + output = 'Timings (seconds):\n------------------\n' + + total = self._end_time - self._start_time - def desc(val): - if val == 1: return "second" - return "seconds" + # Output times with up to 3 decimal places: + decimals = max(0, 4 - len('%d' % int(total))) + length = len(('%%.%df' % decimals) % total) + format = '%%%d.%df' % (length, decimals,) for pass_num in passes: - duration = int(self._pass_timings[pass_num]) - p_str = ('pass %d:%6d %s\n' - % (pass_num, duration, desc(duration))) + (pass_name, duration,) = self._pass_timings[pass_num] + p_str = ((format + ' pass%-2d %s\n') + % (duration, pass_num, pass_name,)) output += p_str - total = int(self._end_time - self._start_time) - output += ('total: %6d %s' % (total, desc(total))) + output += ((format + ' total') % total) return output diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/stdout_delegate.py cvs2svn-2.0.0/cvs2svn_lib/stdout_delegate.py --- cvs2svn-1.5.x/cvs2svn_lib/stdout_delegate.py 2006-09-10 16:36:26.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/stdout_delegate.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -48,14 +48,17 @@ class StdoutDelegate(SVNRepositoryMirror Log().verbose(" New Directory", path) def add_path(self, s_item): - """Print a line stating that we are 'adding' s_item.cvs_rev.svn_path.""" + """Print a line stating what path we are 'adding'.""" - Log().verbose(" Adding", s_item.cvs_rev.svn_path) + Log().verbose(" Adding", s_item.cvs_rev.get_svn_path()) def change_path(self, s_item): - """Print a line stating that we are 'changing' s_item.cvs_rev.svn_path.""" + """Print a line stating what path we are 'changing'.""" - Log().verbose(" Changing", s_item.cvs_rev.svn_path) + Log().verbose(" Changing", s_item.cvs_rev.get_svn_path()) + + def skip_path(self, cvs_rev): + pass def delete_path(self, path): """Print a line stating that we are 'deleting' PATH.""" diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/svn_commit.py cvs2svn-2.0.0/cvs2svn_lib/svn_commit.py --- cvs2svn-1.5.x/cvs2svn_lib/svn_commit.py 2006-09-10 16:36:26.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/svn_commit.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -14,20 +14,21 @@ # history and logs, available at http://cvs2svn.tigris.org/. # ==================================================================== -"""This module contains the CVSCommit class.""" +"""This module contains the SVNCommit classes.""" from cvs2svn_lib.boolean import * +from cvs2svn_lib.common import InternalError from cvs2svn_lib.common import format_date from cvs2svn_lib.common import warning_prefix -from cvs2svn_lib.common import OP_ADD -from cvs2svn_lib.common import OP_CHANGE -from cvs2svn_lib.common import OP_DELETE -from cvs2svn_lib.common import to_utf8 from cvs2svn_lib.context import Ctx from cvs2svn_lib.log import Log -from cvs2svn_lib.symbol import BranchSymbol -from cvs2svn_lib.symbol import TagSymbol +from cvs2svn_lib.symbol import Branch +from cvs2svn_lib.symbol import Tag +from cvs2svn_lib.cvs_item import CVSRevisionAdd +from cvs2svn_lib.cvs_item import CVSRevisionChange +from cvs2svn_lib.cvs_item import CVSRevisionDelete +from cvs2svn_lib.cvs_item import CVSRevisionNoop class SVNCommit: @@ -46,13 +47,7 @@ class SVNCommit: # to create trunk, tags, and branches. revnum = 2 - class SVNCommitInternalInconsistencyError(Exception): - """Exception raised if we encounter an impossible state in the - SVNCommit Databases.""" - - pass - - def __init__(self, description, revnum=None): + def __init__(self, description, date, revnum=None): """Instantiate an SVNCommit. DESCRIPTION is for debugging only. If REVNUM, the SVNCommit will correspond to that revision number; and if CVS_REVS, then they must be the exact set of CVSRevisions for @@ -64,6 +59,11 @@ class SVNCommit: self.description = description + # The date of the commit, as an integer. While the SVNCommit is + # being built up, this contains the latest date seen so far. This + # member is set externally. + self.date = date + # Revprop metadata for this commit. # # These initial values are placeholders. At least the log and the @@ -76,17 +76,17 @@ class SVNCommit: self._author = Ctx().username self._log_msg = "This log message means an SVNCommit was used too soon." - # The date of the commit, as an integer. While the SVNCommit is - # being built up, this contains the latest date seen so far. This - # member is set externally. - self.date = 0 - if revnum: self.revnum = revnum else: self.revnum = SVNCommit.revnum SVNCommit.revnum += 1 + def get_cvs_items(self): + """Return a list containing the CVSItems in this commit.""" + + raise NotImplementedError() + def _get_log_msg(self): """Return a log message for this commit.""" @@ -100,8 +100,8 @@ class SVNCommit: try: utf8_author = None if self._author is not None: - utf8_author = to_utf8(self._author) - utf8_log = to_utf8(log_msg) + utf8_author = Ctx().utf8_encoder(self._author) + utf8_log = Ctx().utf8_encoder(log_msg) return { 'svn:author' : utf8_author, 'svn:log' : utf8_log, 'svn:date' : date } @@ -119,7 +119,8 @@ class SVNCommit: Log().warn("(subversion rev %s)" % self.revnum) Log().warn( - "Consider rerunning with one or more '--encoding' parameters.\n") + "Consider rerunning with one or more '--encoding' parameters or\n" + "with '--fallback-encoding'.\n") # It's better to fall back to the original (unknown encoding) data # than to either 1) quit or 2) record nothing at all. return { 'svn:author' : self._author, @@ -144,9 +145,10 @@ class SVNRevisionCommit(SVNCommit): Derived classes must also call the SVNCommit constructor explicitly.""" - self.cvs_revs = [] - for cvs_rev in cvs_revs: - self.cvs_revs.append(cvs_rev) + self.cvs_revs = list(cvs_revs) + + def get_cvs_items(self): + return self.cvs_revs def __getstate__(self): """Return the part of the state represented by this mixin.""" @@ -159,11 +161,8 @@ class SVNRevisionCommit(SVNCommit): cvs_rev_keys = state cvs_revs = [] - for key in cvs_rev_keys: - cvs_rev_id = int(key, 16) - cvs_rev = Ctx()._cvs_items_db[cvs_rev_id] - cvs_revs.append(cvs_rev) - + keys = [int(key, 16) for key in cvs_rev_keys] + cvs_revs = Ctx()._cvs_items_db.get_many(keys) SVNRevisionCommit.__init__(self, cvs_revs) # Set the author and log message for this commit from the first @@ -186,8 +185,10 @@ class SVNRevisionCommit(SVNCommit): class SVNInitialProjectCommit(SVNCommit): def __init__(self, date, revnum=None): - SVNCommit.__init__(self, 'Initialization', revnum) - self.date = date + SVNCommit.__init__(self, 'Initialization', date, revnum) + + def get_cvs_items(self): + return [] def _get_log_msg(self): return 'New repository initialized by cvs2svn.' @@ -199,6 +200,8 @@ class SVNInitialProjectCommit(SVNCommit) repos.start_commit(self.revnum, self._get_revprops()) for project in Ctx().projects: + # For a trunk-only conversion, trunk_path might be ''. + if project.trunk_path: repos.mkdir(project.trunk_path) if not Ctx().trunk_only: repos.mkdir(project.branches_path) @@ -208,10 +211,13 @@ class SVNInitialProjectCommit(SVNCommit) class SVNPrimaryCommit(SVNCommit, SVNRevisionCommit): - def __init__(self, cvs_revs, revnum=None): - SVNCommit.__init__(self, 'commit', revnum) + def __init__(self, cvs_revs, date, revnum=None): + SVNCommit.__init__(self, 'commit', date, revnum) SVNRevisionCommit.__init__(self, cvs_revs) + def get_cvs_items(self): + return SVNRevisionCommit.get_cvs_items(self) + def __str__(self): return SVNCommit.__str__(self) + SVNRevisionCommit.__str__(self) @@ -233,44 +239,16 @@ class SVNPrimaryCommit(SVNCommit, SVNRev Log().verbose("Committing %d CVSRevision%s" % (len(self.cvs_revs), plural)) for cvs_rev in self.cvs_revs: - if cvs_rev.op == OP_DELETE: - repos.delete_path(cvs_rev.svn_path, Ctx().prune) - - elif (cvs_rev.rev == "1.1.1.1" - and not cvs_rev.deltatext_exists - and repos.path_exists(cvs_rev.svn_path)): - # This change can be omitted. See comment in - # CVSCommit._commit() for what this is all about. Note that - # although asking repos.path_exists() is somewhat expensive, - # we only do it if the first two (cheap) tests succeed first. + if isinstance(cvs_rev, CVSRevisionNoop): pass - elif cvs_rev.op == OP_ADD: - repos.add_path(cvs_rev) + elif isinstance(cvs_rev, CVSRevisionDelete): + repos.delete_path(cvs_rev.get_svn_path(), Ctx().prune) - elif cvs_rev.op == OP_CHANGE: - # Fix for Issue #74: - # - # Here's the scenario. You have file FOO that is imported - # on a non-trunk vendor branch. So in r1.1 and r1.1.1.1, - # the file exists. - # - # Moving forward in time, FOO is deleted on the default - # branch (r1.1.1.2). cvs2svn determines that this delete - # also needs to happen on trunk, so FOO is deleted on - # trunk. - # - # Along come r1.2, whose op is OP_CHANGE (because r1.1 is - # not 'dead', we assume it's a change). However, since - # our trunk file has been deleted, svnadmin blows up--you - # can't change a file that doesn't exist! - # - # Soooo... we just check the path, and if it doesn't - # exist, we do an add... if the path does exist, it's - # business as usual. - if not repos.path_exists(cvs_rev.svn_path): + elif isinstance(cvs_rev, CVSRevisionAdd): repos.add_path(cvs_rev) - else: + + elif isinstance(cvs_rev, CVSRevisionChange): repos.change_path(cvs_rev) repos.end_commit() @@ -280,27 +258,31 @@ class SVNPrimaryCommit(SVNCommit, SVNRev def __setstate__(self, state): (revnum, date, rev_state,) = state - SVNCommit.__init__(self, "Retrieved from disk", revnum) + SVNCommit.__init__(self, "Retrieved from disk", date, revnum) SVNRevisionCommit.__setstate__(self, rev_state) - self.date = date - class SVNSymbolCommit(SVNCommit): - def __init__(self, description, symbol, revnum=None): - SVNCommit.__init__(self, description, revnum) + def __init__(self, symbol, cvs_symbol_ids, date, revnum=None): + SVNCommit.__init__( + self, 'copying to tag/branch %r' % symbol.name, date, revnum) # The TypedSymbol that is filled in this SVNCommit. self.symbol = symbol + self.cvs_symbol_ids = cvs_symbol_ids + + def get_cvs_items(self): + return list(Ctx()._cvs_items_db.get_many(self.cvs_symbol_ids)) + def _get_log_msg(self): """Return a manufactured log message for this commit.""" # Determine whether self.symbol is a tag. - if isinstance(self.symbol, TagSymbol): + if isinstance(self.symbol, Tag): type = 'tag' else: - assert isinstance(self.symbol, BranchSymbol) + assert isinstance(self.symbol, Branch) type = 'branch' # In Python 2.2.3, we could use textwrap.fill(). Oh well :-). @@ -317,19 +299,17 @@ class SVNSymbolCommit(SVNCommit): repos.start_commit(self.revnum, self._get_revprops()) Log().verbose("Filling symbolic name:", self.symbol.get_clean_name()) - repos.fill_symbol(self.symbol) + repos.fill_symbol(self) repos.end_commit() def __getstate__(self): - return (self.revnum, self.symbol.id, self.date) + return (self.revnum, self.symbol.id, self.cvs_symbol_ids, self.date) def __setstate__(self, state): - (revnum, symbol_id, date) = state + (revnum, symbol_id, cvs_symbol_ids, date) = state symbol = Ctx()._symbol_db.get_symbol(symbol_id) - SVNSymbolCommit.__init__(self, "Retrieved from disk", symbol, revnum) - - self.date = date + SVNSymbolCommit.__init__(self, symbol, cvs_symbol_ids, date, revnum) def __str__(self): """ Print a human-readable description of this SVNCommit. @@ -341,15 +321,9 @@ class SVNSymbolCommit(SVNCommit): + " symbolic name: %s\n" % self.symbol.get_clean_name()) -class SVNPreCommit(SVNSymbolCommit): - def __init__(self, symbol, revnum=None): - SVNSymbolCommit.__init__( - self, 'pre-commit symbolic name %r' % symbol.name, symbol, revnum) - - class SVNPostCommit(SVNCommit, SVNRevisionCommit): - def __init__(self, motivating_revnum, cvs_revs, revnum=None): - SVNCommit.__init__(self, 'post-commit default branch(es)', revnum) + def __init__(self, motivating_revnum, cvs_revs, date): + SVNCommit.__init__(self, 'post-commit default branch(es)', date) SVNRevisionCommit.__init__(self, cvs_revs) # The subversion revision number of the *primary* commit where the @@ -366,6 +340,13 @@ class SVNPostCommit(SVNCommit, SVNRevisi # multiple different default branches. self._motivating_revnum = motivating_revnum + def get_cvs_items(self): + # It might seem that we should return + # SVNRevisionCommit.get_cvs_items(self) here, but this commit + # doesn't really include those CVSItems, but rather followup + # commits to those. + return [] + def __str__(self): return SVNCommit.__str__(self) + SVNRevisionCommit.__str__(self) @@ -382,8 +363,9 @@ class SVNPostCommit(SVNCommit, SVNRevisi """Commit SELF to REPOS, which is a SVNRepositoryMirror. Propagate any changes that happened on a non-trunk default branch - to the trunk of the repository. See CVSCommit._post_commit() for - details on why this is necessary.""" + to the trunk of the repository. See + SVNCommitCreator._post_commit() for details on why this is + necessary.""" repos.start_commit(self.revnum, self._get_revprops()) @@ -391,19 +373,30 @@ class SVNPostCommit(SVNCommit, SVNRevisi % self._motivating_revnum) for cvs_rev in self.cvs_revs: - svn_trunk_path = cvs_rev.cvs_file.project.make_trunk_path( + svn_trunk_path = cvs_rev.cvs_file.project.get_trunk_path( cvs_rev.cvs_path) - if cvs_rev.op == OP_ADD or cvs_rev.op == OP_CHANGE: - if repos.path_exists(svn_trunk_path): - # Delete the path on trunk... + if isinstance(cvs_rev, CVSRevisionAdd): + # Copy from branch to trunk: + repos.copy_path( + cvs_rev.get_svn_path(), svn_trunk_path, + self._motivating_revnum, True + ) + elif isinstance(cvs_rev, CVSRevisionChange): + # Delete old version of the path on trunk... repos.delete_path(svn_trunk_path) - # ...and copy over from branch + # ...and copy the new version over from branch: repos.copy_path( - cvs_rev.svn_path, svn_trunk_path, self._motivating_revnum) - else: - assert cvs_rev.op == OP_DELETE - # delete trunk path + cvs_rev.get_svn_path(), svn_trunk_path, + self._motivating_revnum, True + ) + elif isinstance(cvs_rev, CVSRevisionDelete): + # Delete trunk path: repos.delete_path(svn_trunk_path) + elif isinstance(cvs_rev, CVSRevisionNoop): + # Do nothing + pass + else: + raise InternalError('Unexpected CVSRevision type: %s' % (cvs_rev,)) repos.end_commit() @@ -414,17 +407,9 @@ class SVNPostCommit(SVNCommit, SVNRevisi def __setstate__(self, state): (revnum, motivating_revnum, date, rev_state,) = state - SVNCommit.__init__(self, "Retrieved from disk", revnum) + SVNCommit.__init__(self, "Retrieved from disk", date, revnum) SVNRevisionCommit.__setstate__(self, rev_state) - self.date = date self._motivating_revnum = motivating_revnum -class SVNSymbolCloseCommit(SVNSymbolCommit): - def __init__(self, symbol, date, revnum=None): - SVNSymbolCommit.__init__( - self, 'closing tag/branch %r' % symbol.name, symbol, revnum) - self.date = date - - diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/svn_commit_creator.py cvs2svn-2.0.0/cvs2svn_lib/svn_commit_creator.py --- cvs2svn-1.5.x/cvs2svn_lib/svn_commit_creator.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/svn_commit_creator.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,191 @@ +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2000-2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""This module contains the SVNCommitCreator class.""" + + +from __future__ import generators + +import time + +from cvs2svn_lib.boolean import * +from cvs2svn_lib.set_support import * +from cvs2svn_lib import config +from cvs2svn_lib.common import warning_prefix +from cvs2svn_lib.common import InternalError +from cvs2svn_lib.common import DB_OPEN_NEW +from cvs2svn_lib.common import DB_OPEN_READ +from cvs2svn_lib.log import Log +from cvs2svn_lib.context import Ctx +from cvs2svn_lib.artifact_manager import artifact_manager +from cvs2svn_lib.symbol import Branch +from cvs2svn_lib.database import Database +from cvs2svn_lib.cvs_item import CVSRevisionDelete +from cvs2svn_lib.cvs_item import CVSRevisionNoop +from cvs2svn_lib.cvs_item import CVSBranchNoop +from cvs2svn_lib.cvs_item import CVSTagNoop +from cvs2svn_lib.changeset import OrderedChangeset +from cvs2svn_lib.changeset import BranchChangeset +from cvs2svn_lib.changeset import TagChangeset +from cvs2svn_lib.svn_commit import SVNCommit +from cvs2svn_lib.svn_commit import SVNPrimaryCommit +from cvs2svn_lib.svn_commit import SVNSymbolCommit +from cvs2svn_lib.svn_commit import SVNPostCommit + + +class SVNCommitCreator: + """This class creates and yields SVNCommits via process_changeset().""" + + def _post_commit(self, cvs_revs, motivating_revnum, timestamp): + """Generate any SVNCommits needed to follow CVS_REVS. + + That is, handle non-trunk default branches. A revision on a CVS + non-trunk default branch is visible in a default CVS checkout of + HEAD. So we copy such commits over to Subversion's trunk so that + checking out SVN trunk gives the same output as checking out of + CVS's default branch.""" + + cvs_revs = [ + cvs_rev + for cvs_rev in cvs_revs + if (cvs_rev.default_branch_revision + and not isinstance(cvs_rev, CVSRevisionNoop)) + ] + + if cvs_revs: + cvs_revs.sort( + lambda a, b: cmp(a.cvs_file.filename, b.cvs_file.filename) + ) + # Generate an SVNCommit for all of our default branch cvs_revs. + yield SVNPostCommit(motivating_revnum, cvs_revs, timestamp) + + def _process_revision_changeset(self, changeset, timestamp): + """Process CHANGESET, using TIMESTAMP as the commit time. + + Create and yield one or more SVNCommits in the process. CHANGESET + must be an OrderedChangeset. TIMESTAMP is used as the timestamp + for any resulting SVNCommits.""" + + if not changeset.cvs_item_ids: + Log().warn('Changeset has no items: %r' % changeset) + return + + Log().verbose('-' * 60) + Log().verbose('CVS Revision grouping:') + Log().verbose(' Time: %s' % time.ctime(timestamp)) + + # Generate an SVNCommit unconditionally. Even if the only change in + # this group of CVSRevisions is a deletion of an already-deleted + # file (that is, a CVS revision in state 'dead' whose predecessor + # was also in state 'dead'), the conversion will still generate a + # Subversion revision containing the log message for the second dead + # revision, because we don't want to lose that information. + + cvs_revs = list(changeset.get_cvs_items()) + if cvs_revs: + cvs_revs.sort(lambda a, b: cmp(a.cvs_file.filename, b.cvs_file.filename)) + svn_commit = SVNPrimaryCommit(cvs_revs, timestamp) + + yield svn_commit + + for cvs_rev in cvs_revs: + Ctx()._symbolings_logger.log_revision(cvs_rev, svn_commit.revnum) + + # Generate an SVNPostCommit if we have default branch revs. If + # some of the revisions in this commit happened on a non-trunk + # default branch, then those files have to be copied into trunk + # manually after being changed on the branch (because the RCS + # "default branch" appears as head, i.e., trunk, in practice). + # Unfortunately, Subversion doesn't support copies with sources + # in the current txn. All copies must be based in committed + # revisions. Therefore, we generate the copies in a new + # revision. + for svn_post_commit in self._post_commit( + cvs_revs, svn_commit.revnum, timestamp + ): + yield svn_post_commit + + def _process_tag_changeset(self, changeset, timestamp): + """Process TagChangeset CHANGESET, producing a SVNSymbolCommit. + + Filter out CVSTagNoops. If no CVSTags are left, don't generate a + SVNSymbolCommit.""" + + if Ctx().trunk_only: + raise InternalError( + 'TagChangeset encountered during a --trunk-only conversion') + + cvs_tag_ids = [ + cvs_tag.id + for cvs_tag in changeset.get_cvs_items() + if not isinstance(cvs_tag, CVSTagNoop) + ] + if cvs_tag_ids: + yield SVNSymbolCommit(changeset.symbol, cvs_tag_ids, timestamp) + else: + Log().debug( + 'Omitting %r because it contains only CVSTagNoops' % (changeset,) + ) + + def _process_branch_changeset(self, changeset, timestamp): + """Process BranchChangeset CHANGESET, producing a SVNSymbolCommit. + + Filter out CVSBranchNoops. If no CVSBranches are left, don't + generate a SVNSymbolCommit.""" + + if Ctx().trunk_only: + raise InternalError( + 'BranchChangeset encountered during a --trunk-only conversion') + + cvs_branches = [ + cvs_branch + for cvs_branch in changeset.get_cvs_items() + if not isinstance(cvs_branch, CVSBranchNoop) + ] + if cvs_branches: + svn_commit = SVNSymbolCommit( + changeset.symbol, + [cvs_branch.id for cvs_branch in cvs_branches], + timestamp, + ) + yield svn_commit + for cvs_branch in cvs_branches: + Ctx()._symbolings_logger.log_branch_revision( + cvs_branch, svn_commit.revnum + ) + else: + Log().debug( + 'Omitting %r because it contains only CVSBranchNoops' % (changeset,) + ) + + def process_changeset(self, changeset, timestamp): + """Process CHANGESET, using TIMESTAMP for all of its entries. + + Return a generator that generates the resulting SVNCommits. + + The changesets must be fed to this function in proper dependency + order.""" + + if isinstance(changeset, OrderedChangeset): + return self._process_revision_changeset(changeset, timestamp) + elif isinstance(changeset, TagChangeset): + return self._process_tag_changeset(changeset, timestamp) + elif isinstance(changeset, BranchChangeset): + return self._process_branch_changeset(changeset, timestamp) + else: + raise TypeError('Illegal changeset %r' % changeset) + + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/svn_commit_item.py cvs2svn-2.0.0/cvs2svn_lib/svn_commit_item.py --- cvs2svn-1.5.x/cvs2svn_lib/svn_commit_item.py 2006-09-10 16:36:26.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/svn_commit_item.py 2007-08-15 22:53:53.000000000 +0200 @@ -40,17 +40,21 @@ class SVNCommitItem: self.svn_props_changed = svn_props_changed # The properties for this item as a map { key : value }. If VALUE - # is None, no property should be set. + # is None, the property should be left unset. self.svn_props = { } for svn_property_setter in Ctx().svn_property_setters: svn_property_setter.set_properties(self) - # Remember if we need to filter the EOLs. We could actually use - # self.svn_props now, since it is initialized for each revision. - self.needs_eol_filter = \ - self.svn_props.get('svn:eol-style', None) is not None + def needs_eol_filter(self): + """Return True iff EOLs needs to be filtered for this item. - self.has_keywords = self.svn_props.get('svn:keywords', None) is not None + This returns true for any svn:eol-style that does not indicate a + binary file.""" + + return bool(self.svn_props.get('svn:eol-style', None)) + + def has_keywords(self): + return bool(self.svn_props.get('svn:keywords', None)) diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/svn_repository_mirror.py cvs2svn-2.0.0/cvs2svn_lib/svn_repository_mirror.py --- cvs2svn-1.5.x/cvs2svn_lib/svn_repository_mirror.py 2006-09-02 10:44:50.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/svn_repository_mirror.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -19,24 +19,97 @@ from cvs2svn_lib.boolean import * from cvs2svn_lib import config +from cvs2svn_lib.common import InternalError +from cvs2svn_lib.common import DB_OPEN_NEW +from cvs2svn_lib.common import DB_OPEN_READ from cvs2svn_lib.common import path_join from cvs2svn_lib.common import path_split -from cvs2svn_lib.context import Ctx from cvs2svn_lib.log import Log +from cvs2svn_lib.context import Ctx from cvs2svn_lib.key_generator import KeyGenerator from cvs2svn_lib.artifact_manager import artifact_manager -from cvs2svn_lib.database import Database -from cvs2svn_lib.database import SDatabase -from cvs2svn_lib.database import DB_OPEN_NEW -from cvs2svn_lib.database import DB_OPEN_READ -from cvs2svn_lib.symbol import BranchSymbol -from cvs2svn_lib.symbol import TagSymbol -from cvs2svn_lib.symbolings_reader import SymbolingsReader -from cvs2svn_lib.fill_source import FillSource -from cvs2svn_lib.svn_revision_range import SVNRevisionRange +from cvs2svn_lib.serializer import MarshalSerializer +from cvs2svn_lib.database import IndexedDatabase +from cvs2svn_lib.record_table import UnsignedIntegerPacker +from cvs2svn_lib.record_table import RecordTable +from cvs2svn_lib.openings_closings import SymbolingsReader from cvs2svn_lib.svn_commit_item import SVNCommitItem +class _MirrorNode(object): + """Represent a node within the SVNRepositoryMirror. + + Instances of this class act like a map { component : _MirrorNode }, + where component is the path name component of an item within this + node (i.e., a file within this directory). + + Instances also have a particular path, even though the same node + content can have multiple paths within the same repository. The + path member indicates via what path the node was accessed. + + For space efficiency, SVNRepositoryMirror does not actually use this + class to store the data internally, but rather constructs instances + of this class on demand.""" + + def __init__(self, repo, path, key, entries): + # The SVNRepositoryMirror containing this directory: + self.repo = repo + + # The path of this node within the repository: + self.path = path + + # The key of this directory: + self.key = key + + # The entries within this directory (a map from component name to + # node): + self.entries = entries + + def get_subpath(self, *components): + return path_join(self.path, *components) + + def __getitem__(self, component): + """Return the _MirrorNode associated with the specified subnode. + + Return None if the specified subnode does not exist.""" + + key = self.entries.get(component, None) + if key is None: + return None + else: + return self.repo._get_node(self.get_subpath(component), key) + + def __contains__(self, component): + return component in self.entries + + def __iter__(self): + return self.entries.__iter__() + + +class _ReadOnlyMirrorNode(_MirrorNode): + """Represent a read-only node within the SVNRepositoryMirror.""" + + pass + + +class _WritableMirrorNode(_MirrorNode): + """Represent a writable node within the SVNRepositoryMirror.""" + + def __setitem__(self, component, node): + self.entries[component] = node.key + + def __delitem__(self, component): + del self.entries[component] + + def delete_component(self, component): + """Delete the COMPONENT from this directory and notify delagates. + + COMPONENT must exist in this node.""" + + del self[component] + self.repo._invoke_delegates('delete_path', self.get_subpath(component)) + + class SVNRepositoryMirror: """Mirror a Subversion Repository as it is constructed, one SVNCommit at a time. The mirror is skeletal; it does not contain @@ -45,24 +118,25 @@ class SVNRepositoryMirror: set delegates. The structure of the repository is kept in two databases and one - hash. The revs_db database maps revisions to root node keys, and - the nodes_db database maps node keys to nodes. A node is a hash - from directory names to keys. Both the revs_db and the nodes_db are - stored on disk and each access is expensive. - - The nodes_db database only has the keys for old revisions. The - revision that is being contructed is kept in memory in the new_nodes - hash which is cheap to access. + hash. The _svn_revs_root_nodes database maps revisions to root node + keys, and the _nodes_db database maps node keys to nodes. A node is + a hash from directory names to keys. Both the _svn_revs_root_nodes + and the _nodes_db are stored on disk and each access is expensive. + + The _nodes_db database only has the keys for old revisions. The + revision that is being contructed is kept in memory in the + _new_nodes map, which is cheap to access. - You must invoke start_commit() between SVNCommits. + You must invoke start_commit() before each SVNCommit and + end_commit() afterwards. - *** WARNING *** All path arguments to methods in this class CANNOT + *** WARNING *** Path arguments to methods in this class MUST NOT have leading or trailing slashes.""" class SVNRepositoryMirrorParentMissingError(Exception): """Exception raised if an attempt is made to add a path to the - repository mirror but the parent's path doesn't exist in the youngest - revision of the repository.""" + repository mirror but the parent's path doesn't exist in the + youngest revision of the repository.""" pass @@ -73,177 +147,180 @@ class SVNRepositoryMirror: pass - class SVNRepositoryMirrorInvalidFillOperationError(Exception): - """Exception raised if an empty SymbolFillingGuide is returned - during a fill where the branch in question already exists.""" - - pass - def __init__(self): """Set up the SVNRepositoryMirror and prepare it for SVNCommits.""" - self.key_generator = KeyGenerator() + self._key_generator = KeyGenerator() - self.delegates = [ ] + self._delegates = [ ] - # This corresponds to the 'revisions' table in a Subversion fs. - self.revs_db = SDatabase( - artifact_manager.get_temp_file(config.SVN_MIRROR_REVISIONS_DB), - DB_OPEN_NEW) + # A map from SVN revision number to root node number: + self._svn_revs_root_nodes = RecordTable( + artifact_manager.get_temp_file(config.SVN_MIRROR_REVISIONS_TABLE), + DB_OPEN_NEW, UnsignedIntegerPacker()) # This corresponds to the 'nodes' table in a Subversion fs. (We # don't need a 'representations' or 'strings' table because we # only track metadata, not file contents.) - self.nodes_db = Database( - artifact_manager.get_temp_file(config.SVN_MIRROR_NODES_DB), - DB_OPEN_NEW) + self._nodes_db = IndexedDatabase( + artifact_manager.get_temp_file(config.SVN_MIRROR_NODES_STORE), + artifact_manager.get_temp_file(config.SVN_MIRROR_NODES_INDEX_TABLE), + DB_OPEN_NEW, serializer=MarshalSerializer()) # Start at revision 0 without a root node. It will be created # by _open_writable_root_node. - self.youngest = 0 - self.new_root_key = None - self.new_nodes = { } + self._youngest = 0 - if not Ctx().trunk_only: - self.symbolings_reader = SymbolingsReader() + self._symbolings_reader = SymbolingsReader() def start_commit(self, revnum, revprops): """Start a new commit.""" - self.youngest = revnum - self.new_root_key = None - self.new_nodes = { } + self._youngest = revnum + self._new_root_node = None + self._new_nodes = { } self._invoke_delegates('start_commit', revnum, revprops) + if revnum == 1: + # For the first revision, we have to create the root directory + # out of thin air: + self._new_root_node = self._create_node('') + def end_commit(self): """Called at the end of each commit. This method copies the newly created nodes to the on-disk nodes db.""" - if self.new_root_key is None: + if self._new_root_node is None: # No changes were made in this revision, so we make the root node # of the new revision be the same as the last one. - self.revs_db[str(self.youngest)] = self.revs_db[str(self.youngest - 1)] + self._svn_revs_root_nodes[self._youngest] = \ + self._svn_revs_root_nodes[self._youngest - 1] else: - self.revs_db[str(self.youngest)] = self.new_root_key - # Copy the new nodes to the nodes_db - for key, value in self.new_nodes.items(): - self.nodes_db[key] = value + self._svn_revs_root_nodes[self._youngest] = self._new_root_node.key + # Copy the new nodes to the _nodes_db + for key, value in self._new_nodes.items(): + self._nodes_db[key] = value + + del self._new_root_node + del self._new_nodes self._invoke_delegates('end_commit') - def _get_node(self, key): - """Returns the node contents for KEY which may refer to either - self.nodes_db or self.new_nodes.""" + def _create_node(self, path, entries=None): + if entries is None: + entries = {} + else: + entries = entries.copy() + + node = _WritableMirrorNode( + self, path, self._key_generator.gen_id(), entries) + + self._new_nodes[node.key] = node.entries + return node + + def _get_node(self, path, key): + """Returns the node for PATH and key KEY. - if key in self.new_nodes: - return self.new_nodes[key] + The node might be read from either self._nodes_db or + self._new_nodes. Return an instance of _MirrorNode.""" + + contents = self._new_nodes.get(key, None) + if contents is not None: + return _WritableMirrorNode(self, path, key, contents) else: - return self.nodes_db[key] + return _ReadOnlyMirrorNode(self, path, key, self._nodes_db[key]) def _open_readonly_node(self, path, revnum): - """Open a readonly node for PATH at revision REVNUM. Returns the - node key and node contents if the path exists, else (None, None).""" + """Open a readonly node for PATH at revision REVNUM. + + Return an instance of _MirrorNode if the path exists, else None.""" # Get the root key - if revnum == self.youngest: - if self.new_root_key is None: - node_key = self.revs_db[str(self.youngest - 1)] + if revnum == self._youngest: + if self._new_root_node is None: + node_key = self._svn_revs_root_nodes[self._youngest - 1] else: - node_key = self.new_root_key + node_key = self._new_root_node.key else: - node_key = self.revs_db[str(revnum)] + node_key = self._svn_revs_root_nodes[revnum] + node = self._get_node('', node_key) for component in path.split('/'): - node_contents = self._get_node(node_key) - node_key = node_contents.get(component, None) - if node_key is None: + node = node[component] + if node is None: return None - return node_key + return node def _open_writable_root_node(self): - """Open a writable root node. The current root node is returned - immeditely if it is already writable. If not, create a new one by - copying the contents of the root node of the previous version.""" + """Open and return a writable root node. - if self.new_root_key is not None: - return self.new_root_key, self.new_nodes[self.new_root_key] + The current root node is returned immeditely if it is already + writable. If not, create a new one by copying the contents of the + root node of the previous version.""" + + if self._new_root_node is None: + # Root node still has to be created for this revision: + old_root_node = self._get_node( + '', self._svn_revs_root_nodes[self._youngest - 1]) + self._new_root_node = self._create_node('', old_root_node.entries) - if self.youngest < 2: - new_contents = { } - else: - new_contents = self.nodes_db[self.revs_db[str(self.youngest - 1)]] - self.new_root_key = self.key_generator.gen_key() - self.new_nodes = { self.new_root_key: new_contents } - - return self.new_root_key, new_contents + return self._new_root_node def _open_writable_node(self, svn_path, create): - """Open a writable node for the path SVN_PATH, creating SVN_PATH - and any missing directories if CREATE is True.""" - - parent_key, parent_contents = self._open_writable_root_node() + """Open a writable node for the path SVN_PATH. - # Walk up the path, one node at a time. - path_so_far = None - components = svn_path.split('/') - for i in range(len(components)): - component = components[i] - path_so_far = path_join(path_so_far, component) - this_key = parent_contents.get(component, None) - if this_key is not None: + Iff CREATE is True, create a directory node at SVN_PATH and any + missing directories. Return an instance of _WritableMirrorNode, + or None if SVN_PATH doesn't exist and CREATE is not set.""" + + node = self._open_writable_root_node() + + if svn_path: + # Walk down the path, one node at a time. + for component in svn_path.split('/'): + new_node = node[component] + if new_node is not None: # The component exists. - this_contents = self.new_nodes.get(this_key, None) - if this_contents is None: - # Suck the node from the nodes_db, but update the key - this_contents = self.nodes_db[this_key] - this_key = self.key_generator.gen_key() - self.new_nodes[this_key] = this_contents - parent_contents[component] = this_key + if not isinstance(new_node, _WritableMirrorNode): + # Create a new node, with entries initialized to be the same + # as those of the old node: + new_node = self._create_node(new_node.path, new_node.entries) + node[component] = new_node elif create: - # The component does not exists, so we create it. - this_contents = { } - this_key = self.key_generator.gen_key() - self.new_nodes[this_key] = this_contents - parent_contents[component] = this_key - if i < len(components) - 1: - self._invoke_delegates('mkdir', path_so_far) + # The component does not exist, so we create it. + new_node = self._create_node(path_join(node.path, component)) + node[component] = new_node + self._invoke_delegates('mkdir', new_node.path) else: - # The component does not exists and we are not instructed to + # The component does not exist and we are not instructed to # create it, so we give up. - return None, None + return None - parent_key = this_key - parent_contents = this_contents + node = new_node - return this_key, this_contents + return node def path_exists(self, path): - """Return True iff PATH exists in self.youngest of the repository mirror. + """Return True iff PATH exists in self._youngest of the repository mirror. PATH must not start with '/'.""" - return self._open_readonly_node(path, self.youngest) is not None - - def _fast_delete_path(self, parent_path, parent_contents, component): - """Delete COMPONENT from the parent direcory PARENT_PATH with the - contents PARENT_CONTENTS. Do nothing if COMPONENT does not exist - in PARENT_CONTENTS.""" - - if component in parent_contents: - del parent_contents[component] - self._invoke_delegates('delete_path', - path_join(parent_path, component)) + return self._open_readonly_node(path, self._youngest) is not None def delete_path(self, svn_path, should_prune=False): - """Delete PATH from the tree. If SHOULD_PRUNE is true, then delete - all ancestor directories that are made empty when SVN_PATH is deleted. - In other words, SHOULD_PRUNE is like the -P option to 'cvs checkout'. - - NOTE: This function ignores requests to delete the root directory - or any directory for which any project's is_unremovable() method - returns True, either directly or by pruning.""" + """Delete SVN_PATH from the tree. + + SVN_PATH must currently exist. + + If SHOULD_PRUNE is true, then delete all ancestor directories that + are made empty when SVN_PATH is deleted. In other words, + SHOULD_PRUNE is like the -P option to 'cvs checkout'. + + This function ignores requests to delete the root directory or any + directory for which any project's is_unremovable() method returns + True, either directly or by pruning.""" if svn_path == '': return @@ -252,30 +329,24 @@ class SVNRepositoryMirror: return (parent_path, entry,) = path_split(svn_path) - if parent_path: - parent_key, parent_contents = \ - self._open_writable_node(parent_path, False) - else: - parent_key, parent_contents = self._open_writable_root_node() + parent_node = self._open_writable_node(parent_path, False) - if parent_key is not None: - self._fast_delete_path(parent_path, parent_contents, entry) + parent_node.delete_component(entry) # The following recursion makes pruning an O(n^2) operation in the # worst case (where n is the depth of SVN_PATH), but the worst case # is probably rare, and the constant cost is pretty low. Another # drawback is that we issue a delete for each path and not just # a single delete for the topmost directory pruned. - if should_prune and len(parent_contents) == 0: + if should_prune and len(parent_node.entries) == 0: self.delete_path(parent_path, True) def mkdir(self, path): """Create PATH in the repository mirror at the youngest revision.""" self._open_writable_node(path, True) - self._invoke_delegates('mkdir', path) def change_path(self, cvs_rev): - """Register a change in self.youngest for the CVS_REV's svn_path + """Register a change in self._youngest for the CVS_REV's svn_path in the repository mirror.""" # We do not have to update the nodes because our mirror is only @@ -286,44 +357,57 @@ class SVNRepositoryMirror: def add_path(self, cvs_rev): """Add the CVS_REV's svn_path to the repository mirror.""" - self._open_writable_node(cvs_rev.svn_path, True) + (parent_path, component,) = path_split(cvs_rev.get_svn_path()) + parent_node = self._open_writable_node(parent_path, True) + + assert component not in parent_node + + parent_node[component] = \ + self._create_node(path_join(parent_node.path, component)) + self._invoke_delegates('add_path', SVNCommitItem(cvs_rev, True)) - def copy_path(self, src_path, dest_path, src_revnum): - """Copy SRC_PATH at subversion revision number SRC_REVNUM to - DEST_PATH. In the youngest revision of the repository, DEST_PATH's - parent *must* exist, but DEST_PATH *cannot* exist. - - Return the node key and the contents of the new node at DEST_PATH - as a dictionary.""" - - # get the contents of the node of our src_path - src_key = self._open_readonly_node(src_path, src_revnum) - src_contents = self._get_node(src_key) + def skip_path(self, cvs_rev): + """This does nothing, except for allowing the delegate to handle + skipped revisions symmetrically.""" + self._invoke_delegates('skip_path', cvs_rev) + + def copy_path(self, src_path, dest_path, src_revnum, create_parent=False): + """Copy SRC_PATH at subversion revision number SRC_REVNUM to DEST_PATH. + + In the youngest revision of the repository, DEST_PATH's parent + *must* exist unless create_parent is specified. DEST_PATH itself + *must not* exist. + + Return the new node at DEST_PATH. Note that this node is not + necessarily writable, though its parent node necessarily is.""" + + # Get the node of our src_path + src_node = self._open_readonly_node(src_path, src_revnum) # Get the parent path and the base path of the dest_path (dest_parent, dest_basename,) = path_split(dest_path) - dest_parent_key, dest_parent_contents = \ - self._open_writable_node(dest_parent, False) + dest_parent_node = self._open_writable_node(dest_parent, create_parent) - if dest_parent_key is None: + if dest_parent_node is None: raise self.SVNRepositoryMirrorParentMissingError( "Attempt to add path '%s' to repository mirror, " "but its parent directory doesn't exist in the mirror." % dest_path) - elif dest_basename in dest_parent_contents: + elif dest_basename in dest_parent_node: raise self.SVNRepositoryMirrorPathExistsError( "Attempt to add path '%s' to repository mirror " "when it already exists in the mirror." % dest_path) - dest_parent_contents[dest_basename] = src_key + dest_parent_node[dest_basename] = src_node self._invoke_delegates('copy_path', src_path, dest_path, src_revnum) - # Yes sir, src_key and src_contents are also the contents of the - # destination. This is a cheap copy, remember! :-) - return src_key, src_contents + # This is a cheap copy, so src_node has the same contents as the + # new destination node. But we have to get it from its parent + # node again so that its path is correct. + return dest_parent_node[dest_basename] - def fill_symbol(self, symbol): - """Performs all copies necessary to create as much of the the tag + def fill_symbol(self, svn_symbol_commit): + """Perform all copies necessary to create as much of the the tag or branch SYMBOL as possible given the current revision of the repository mirror. SYMBOL is an instance of TypedSymbol. @@ -331,149 +415,112 @@ class SVNRepositoryMirror: repository by the end of this call, even if there are no paths under it.""" - symbol_fill = self.symbolings_reader.filling_guide_for_symbol( - symbol, self.youngest) - # Get the list of sources for the symbolic name. - sources = symbol_fill.get_sources() - - if sources: - if isinstance(symbol, TagSymbol): - dest_prefix = symbol.project.get_tag_path(symbol) - else: - assert isinstance(symbol, BranchSymbol) - dest_prefix = symbol.project.get_branch_path(symbol) + symbol = svn_symbol_commit.symbol - dest_key = self._open_writable_node(dest_prefix, False)[0] - self._fill(symbol_fill, dest_prefix, dest_key, sources) - else: - # We can only get here for a branch whose first commit is an add - # (as opposed to a copy). - dest_path = symbol.project.get_branch_path(symbol) - if not self.path_exists(dest_path): - # If our symbol_fill was empty, that means that our first - # commit on the branch was to a file added on the branch, and - # that this is our first fill of that branch. - # - # This case is covered by test 16. - # - # ...we create the branch by copying trunk from the our - # current revision number minus 1 - source_path = symbol.project.trunk_path - entries = self.copy_path(source_path, dest_path, self.youngest - 1)[1] - # Now since we've just copied trunk to a branch that's - # *supposed* to be empty, we delete any entries in the - # copied directory. - for entry in entries: - del_path = dest_path + '/' + entry - # Delete but don't prune. - self.delete_path(del_path) - else: - raise self.SVNRepositoryMirrorInvalidFillOperationError( - "Error filling branch '%s'.\n" - "Received an empty SymbolFillingGuide and\n" - "attempted to create a branch that already exists." - % symbol.get_clean_name() + # Get the set of sources for the symbolic name: + source_set = self._symbolings_reader.get_source_set( + svn_symbol_commit, self._youngest + ) + + if not source_set: + raise InternalError( + 'fill_symbol() called for %s with empty source set' % (symbol,) ) - def _fill(self, symbol_fill, dest_prefix, dest_key, sources, - path = None, parent_source_prefix = None, - preferred_revnum = None, prune_ok = None): - """Fill the tag or branch at DEST_PREFIX + PATH with items from - SOURCES, and recurse into the child items. - - DEST_PREFIX is the prefix of the destination directory, e.g. - '/tags/my_tag' or '/branches/my_branch', and SOURCES is a list of - FillSource classes that are candidates to be copied to the - destination. DEST_KEY is the key in self.nodes_db to the - destination, or None if the destination does not yet exist. - - PATH is the path relative to DEST_PREFIX. If PATH is None, we - are at the top level, e.g. '/tags/my_tag'. - - PARENT_SOURCE_PREFIX is the source prefix that was used to copy - the parent directory, and PREFERRED_REVNUM is an int which is the - source revision number that the caller (who may have copied KEY's - parent) used to perform its copy. If PREFERRED_REVNUM is None, - then no revision is preferable to any other (which probably means - that no copies have happened yet). + dest_node = self._open_writable_node(symbol.get_path(), False) + self._fill(symbol, dest_node, source_set) + + def _prune_extra_entries(self, dest_path, dest_node, src_entries): + """Delete any entries in DEST_NODE that are not in SRC_ENTRIES. + + This might require creating a new writable node, so return a + possibly-modified dest_node.""" + + delete_list = [ + component + for component in dest_node + if component not in src_entries] + if delete_list: + if not isinstance(dest_node, _WritableMirrorNode): + dest_node = self._open_writable_node(dest_path, False) + # Sort the delete list so that the output is in a consistent + # order: + delete_list.sort() + for component in delete_list: + dest_node.delete_component(component) + return dest_node + + def _fill(self, symbol, dest_node, source_set, + parent_source=None, prune_ok=False): + """Fill the tag or branch SYMBOL at the path indicated by SOURCE_SET. + + Use items from SOURCE_SET, and recurse into the child items. + + Fill SYMBOL starting at the path SYMBOL.get_path(SOURCE_SET.path). + DEST_NODE is the node of this destination path, or None if the + destination does not yet exist. All directories above this path + have already been filled. SOURCE_SET is a list of FillSource + classes that are candidates to be copied to the destination. + + PARENT_SOURCE is the source that was best for the parent + directory. (Note that the parent directory wasn't necessarily + copied in this commit, but PARENT_SOURCE was chosen anyway.) We + prefer to copy from the same source as was used for the parent, + since it typically requires less touching-up. If PARENT_SOURCE is + None, then this is the top-level directory, and no revision is + preferable to any other (which probably means that no copies have + happened yet). PRUNE_OK means that a copy has been made in this recursion, and it's safe to prune directories that are not in - SYMBOL_FILL._node_tree, provided that said directory has a source - prefix of one of the PARENT_SOURCE_PREFIX. + SYMBOL_FILL._node_tree. - PATH, PARENT_SOURCE_PREFIX, PRUNE_OK, and PREFERRED_REVNUM - should only be passed in by recursive calls.""" + PARENT_SOURCE, and PRUNE_OK should only be passed in by recursive + calls.""" - # Calculate scores and revnums for all sources - for source in sources: - src_revnum, score = symbol_fill.get_best_revnum(source.node, - preferred_revnum) - source.set_score(score, src_revnum) - - # Sort the sources in descending score order so that we will make - # a eventual copy from the source with the highest score. - sources.sort() - copy_source = sources[0] + copy_source = source_set.get_best_source() - src_path = path_join(copy_source.prefix, path) - dest_path = path_join(dest_prefix, path) + src_path = path_join(copy_source.prefix, source_set.path) + dest_path = symbol.get_path(source_set.path) # Figure out if we shall copy to this destination and delete any # destination path that is in the way. - do_copy = 0 - if dest_key is None: - do_copy = 1 - elif prune_ok and (parent_source_prefix != copy_source.prefix or - copy_source.revnum != preferred_revnum): - # We are about to replace the destination, so we need to remove - # it before we perform the copy. + if dest_node is None: + # The destination does not exist at all, so it definitely has to + # be copied: + do_copy = True + elif prune_ok and ( + parent_source is None + or copy_source.prefix != parent_source.prefix + or copy_source.revnum != parent_source.revnum): + # The parent path was copied from a different source than we + # need to use, so we have to delete the version that was copied + # with the parent before we can re-copy from the correct source: self.delete_path(dest_path) - do_copy = 1 + do_copy = True + else: + do_copy = False if do_copy: - dest_key, dest_entries = self.copy_path(src_path, dest_path, - copy_source.revnum) - prune_ok = 1 - else: - dest_entries = self._get_node(dest_key) + dest_node = self.copy_path(src_path, dest_path, copy_source.revnum) + prune_ok = True - # Create the SRC_ENTRIES hash from SOURCES. The keys are path - # elements and the values are lists of FillSource classes where - # this path element exists. - src_entries = {} - for source in sources: - if isinstance(source.node, SVNRevisionRange): - continue - for entry, node in source.node.items(): - src_entries.setdefault(entry, []).append( - FillSource(source.project, source.prefix, node)) + # Get the map {entry : FillSourceSet} for entries within this + # directory that need filling. + src_entries = source_set.get_subsource_sets(copy_source) if prune_ok: - # Delete the entries in DEST_ENTRIES that are not in src_entries. - delete_list = [ ] - for entry in dest_entries: - if entry not in src_entries: - delete_list.append(entry) - if delete_list: - if dest_key not in self.new_nodes: - dest_key, dest_entries = self._open_writable_node(dest_path, True) - # Sort the delete list to get "diffable" dumpfiles. - delete_list.sort() - for entry in delete_list: - self._fast_delete_path(dest_path, dest_entries, entry) + dest_node = self._prune_extra_entries(dest_path, dest_node, src_entries) # Recurse into the SRC_ENTRIES keys sorted in alphabetical order. - src_keys = src_entries.keys() - src_keys.sort() - for src_key in src_keys: - next_dest_key = dest_entries.get(src_key, None) - self._fill(symbol_fill, dest_prefix, next_dest_key, - src_entries[src_key], path_join(path, src_key), - copy_source.prefix, sources[0].revnum, prune_ok) + entries = src_entries.keys() + entries.sort() + for entry in entries: + self._fill(symbol, dest_node[entry], src_entries[entry], + copy_source, prune_ok) def add_delegate(self, delegate): - """Adds DELEGATE to self.delegates. + """Adds DELEGATE to self._delegates. For every delegate you add, as soon as SVNRepositoryMirror performs a repository action method, SVNRepositoryMirror will call @@ -481,22 +528,24 @@ class SVNRepositoryMirror: delegates will be called in the order that they are added. See SVNRepositoryMirrorDelegate for more information.""" - self.delegates.append(delegate) + self._delegates.append(delegate) def _invoke_delegates(self, method, *args): """Iterate through each of our delegates, in the order that they were added, and call the delegate's method named METHOD with the arguments in ARGS.""" - for delegate in self.delegates: + for delegate in self._delegates: getattr(delegate, method)(*args) - def finish(self): - """Calls the delegate finish method.""" + def close(self): + """Call the delegate finish methods and close databases.""" self._invoke_delegates('finish') - self.revs_db = None - self.nodes_db = None + self._svn_revs_root_nodes.close() + self._svn_revs_root_nodes = None + self._nodes_db.close() + self._nodes_db = None class SVNRepositoryMirrorDelegate: @@ -541,6 +590,12 @@ class SVNRepositoryMirrorDelegate: raise NotImplementedError + def skip_path(self, cvs_rev): + """CVS_REV is a CVSRevision; see subclass implementation for + details.""" + + raise NotImplementedError + def delete_path(self, path): """PATH is a string; see subclass implementation for details.""" diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/symbol.py cvs2svn-2.0.0/cvs2svn_lib/symbol.py --- cvs2svn-1.5.x/cvs2svn_lib/symbol.py 2006-08-27 17:39:09.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/symbol.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -14,38 +14,128 @@ # history and logs, available at http://cvs2svn.tigris.org/. # ==================================================================== -"""This module contains classes to represent symbols.""" +"""This module contains classes that represent trunk, branches, and tags. + +The classes in this module represent lines of development, or LODs for +short. Trunk, Branches, and Tags are all LODs. + +Symbols include Branches and Tags. Each Symbol has an identifier that +is unique across the whole conversion, and multiple instances +representing the same abstract Symbol have the same identifier. The +Symbols in one project are distinct from those in another project, and +have non-overlapping ids. Even if, for example, two projects each +have branches with the same name, the branches are considered +distinct. + +Prior to CollateSymbolsPass, it is not known which symbols will be +converted as branches and which as tags. In this phase, the symbols +are all represented by instances of the non-specific Symbol class. +During CollateSymbolsPass, the Symbol instances are replaced by +instances of Branch or Tag. But the ids are preserved even when the +symbols are converted. (This is important to avoid having to rewrite +databases with new symbol ids in CollateSymbolsPass.) In particular, +it is possible that a Symbol, Branch, and Tag instance all have the +same id, in which case they are all considered equal. + +Trunk instances do not have ids, but Trunk objects can be compared to +Symbol objects (trunks always compare less than symbols).""" from cvs2svn_lib.boolean import * from cvs2svn_lib.context import Ctx +from cvs2svn_lib.common import path_join -class Symbol: - def __init__(self, id, project, name): +class LineOfDevelopment: + """Base class for Trunk, Branch, and Tag.""" + + def get_path(self, *components): + """Return the svn path for this LineOfDevelopment.""" + + raise NotImplementedError() + + +class Trunk(LineOfDevelopment): + """Represent the main line of development.""" + + def __init__(self, id, project): self.id = id self.project = project - self.name = name + + def __getstate__(self): + return (self.id, self.project.id,) + + def __setstate__(self, state): + (self.id, project_id,) = state + self.project = Ctx().projects[project_id] + + def __eq__(self, other): + return isinstance(other, Trunk) and self.project == other.project def __cmp__(self, other): - return cmp(self.project, other.project) or cmp(self.id, other.id) + if isinstance(other, Trunk): + return cmp(self.project, other.project) + else: + # Allow Trunk to compare less than Symbols: + return -1 def __hash__(self): - return hash( (self.project, self.id,) ) + return hash(self.project) + + def get_path(self, *components): + return self.project.get_trunk_path(*components) def __str__(self): - return self.name + """For convenience only. The format is subject to change at any time.""" + + return 'Trunk' def __repr__(self): return '%s <%x>' % (self, self.id,) + +class Symbol: + def __init__(self, id, project, name): + self.id = id + self.project = project + self.name = name + + # If this symbol has a preferred parent, this member is the id of + # the LineOfDevelopment instance representing it. If the symbol + # never appeared in a CVSTag or CVSBranch (for example, because + # all of the branches on this LOD have been detached from the + # dependency tree), then this field is set to None. This field is + # set during FilterSymbolsPass. + self.preferred_parent_id = None + def __getstate__(self): - return (self.id, self.project.id, self.name,) + return (self.id, self.project.id, self.name, self.preferred_parent_id,) def __setstate__(self, state): - (self.id, project_id, self.name,) = state + (self.id, project_id, self.name, self.preferred_parent_id,) = state self.project = Ctx().projects[project_id] + def __eq__(self, other): + return isinstance(other, Symbol) and self.id == other.id + + def __cmp__(self, other): + if isinstance(other, Symbol): + return cmp(self.project, other.project) \ + or cmp(self.name, other.name) \ + or cmp(self.id, other.id) + else: + # Allow Symbols to compare greater than Trunk: + return +1 + + def __hash__(self): + return self.id + + def __str__(self): + return self.name + + def __repr__(self): + return '%s<%x>' % (self, self.id,) + def get_clean_name(self): """Return self.name, translating characters that Subversion does not allow in a pathname. @@ -67,24 +157,38 @@ class TypedSymbol(Symbol): Symbol.__init__(self, symbol.id, symbol.project, symbol.name) -class BranchSymbol(TypedSymbol): +class IncludedSymbol(TypedSymbol, LineOfDevelopment): + """A TypedSymbol that will be included in the conversion.""" + + pass + + +class Branch(IncludedSymbol): + """An object that describes a CVS branch.""" + + def get_path(self, *components): + return self.project.get_branch_path(self, *components) + def __str__(self): """For convenience only. The format is subject to change at any time.""" - return 'Branch %r' % (self.name,) + return 'Branch(%r)' % (self.name,) + +class Tag(IncludedSymbol): + def get_path(self, *components): + return self.project.get_tag_path(self, *components) -class TagSymbol(TypedSymbol): def __str__(self): """For convenience only. The format is subject to change at any time.""" - return 'Tag %r' % (self.name,) + return 'Tag(%r)' % (self.name,) class ExcludedSymbol(TypedSymbol): def __str__(self): """For convenience only. The format is subject to change at any time.""" - return 'ExcludedSymbol %r' % (self.name,) + return 'ExcludedSymbol(%r)' % (self.name,) diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/symbol_database.py cvs2svn-2.0.0/cvs2svn_lib/symbol_database.py --- cvs2svn-1.5.x/cvs2svn_lib/symbol_database.py 2006-08-20 14:12:27.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/symbol_database.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -20,8 +20,10 @@ import cPickle from cvs2svn_lib.boolean import * +from cvs2svn_lib.log import Log from cvs2svn_lib import config from cvs2svn_lib.artifact_manager import artifact_manager +from cvs2svn_lib.symbol import Trunk class SymbolDatabase: @@ -48,12 +50,15 @@ class SymbolDatabase: return self._symbols[id] + def close(self): + self._symbols = None + def create_symbol_database(symbols): """Create and fill a symbol database. Record each symbol that is listed in SYMBOLS, which is an iterable - containing TypedSymbol objects.""" + containing Trunk and TypedSymbol objects.""" f = open(artifact_manager.get_temp_file(config.SYMBOL_DB), 'wb') cPickle.dump(symbols, f, -1) diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/symbol_filling_guide.py cvs2svn-2.0.0/cvs2svn_lib/symbol_filling_guide.py --- cvs2svn-1.5.x/cvs2svn_lib/symbol_filling_guide.py 2006-09-02 10:44:50.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/symbol_filling_guide.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -14,218 +14,364 @@ # history and logs, available at http://cvs2svn.tigris.org/. # ==================================================================== -"""This module contains database facilities used by cvs2svn.""" +"""This module contains classes to help choose symbol sources.""" +from __future__ import generators + +import bisect + from cvs2svn_lib.boolean import * +from cvs2svn_lib.set_support import * from cvs2svn_lib.common import path_join from cvs2svn_lib.common import path_split from cvs2svn_lib.common import FatalError from cvs2svn_lib.common import SVN_INVALID_REVNUM -from cvs2svn_lib.context import Ctx from cvs2svn_lib.svn_revision_range import SVNRevisionRange -from cvs2svn_lib.fill_source import FillSource -class SymbolFillingGuide: - """A node tree representing the source paths to be copied to fill - self.name in the current SVNCommit. +class _RevisionScores: + """Represent the scores for a range of revisions.""" - self._node_tree is the root of the directory tree, in the form { - path_component : subnode }. Leaf nodes are instances of - SVNRevisionRange. Intermediate (directory) nodes are dictionaries - mapping relative names to subnodes. + def __init__(self, svn_revision_ranges): + """Initialize based on SVN_REVISION_RANGES. - By walking self._node_tree and calling self.get_best_revnum() on - each node, the caller can determine what subversion revision number - to copy the path corresponding to that node from. self._node_tree - should be treated as read-only. + SVN_REVISION_RANGES is a list of SVNRevisionRange objects. - The caller can then descend to sub-nodes to see if their "best - revnum" differs from their parents' and if it does, take appropriate - actions to "patch up" the subtrees.""" + The score of an svn revision is defined to be the number of + SVNRevisionRanges that include the revision. A score thus + indicates that copying the corresponding revision (or any + following revision up to the next revision in the list) of the + object in question would yield that many correct paths at or + underneath the object. There may be other paths underneath it + which are not correct and would need to be deleted or recopied; + those can only be detected by descending and examining their + scores. - def __init__(self, openings_closings_map): - """Initializes a SymbolFillingGuide for OPENINGS_CLOSINGS_MAP and - store into it the openings and closings from - OPENINGS_CLOSINGS_MAP.""" + If SVN_REVISION_RANGES is empty, then all scores are undefined.""" - self.symbol = openings_closings_map.symbol + # A list that looks like: + # + # [(REV1 SCORE1), (REV2 SCORE2), (REV3 SCORE3), ...] + # + # where the tuples are sorted by revision number and the revision + # numbers are distinct. Score is the number of correct paths that + # would result from using the specified revision number (or any + # other revision preceding the next revision listed) as a source. + # For example, the score of any revision REV in the range REV2 <= + # REV < REV3 is equal to SCORE2. + self.scores = [] - # The dictionary that holds our node tree as a map { node_key : - # node }. - self._node_tree = { } + # First look for easy out. + if not svn_revision_ranges: + return - for svn_path, svn_revision_range in openings_closings_map.get_things(): - (head, tail) = path_split(svn_path) - self._get_node_for_path(head)[tail] = svn_revision_range + # Create lists of opening and closing revisions along with the + # corresponding delta to the total score: + openings = [ (x.opening_revnum, +1) + for x in svn_revision_ranges ] + closings = [ (x.closing_revnum, -1) + for x in svn_revision_ranges + if x.closing_revnum is not None ] - #self.print_node_tree(self._node_tree) + things = openings + closings + # Sort by revision number: + things.sort() + # Initialize output list with zeroth element of things. This + # element must exist, because it was verified that + # svn_revision_ranges (and therefore openings) is not empty. + self.scores = [ things[0] ] + total = things[0][1] + for (rev, change) in things[1:]: + total += change + if rev == self.scores[-1][0]: + # Same revision as last entry; modify last entry: + self.scores[-1] = (rev, total) + else: + # Previously-unseen revision; create new entry: + self.scores.append((rev, total)) - def _get_node_for_path(self, svn_path): - """Return the node key for svn_path, creating new nodes as needed.""" + def get_score(self, rev): + """Return the score for svn revision REV. - # Walk down the path, one node at a time. - node = self._node_tree - for component in svn_path.split('/'): - if component in node: - node = node[component] - else: - old_node = node - node = {} - old_node[component] = node + If REV doesn't appear explicitly in self.scores, use the score of + the higest revision preceding REV. If there are no preceding + revisions, then the score for REV is unknown; in this case, return + -1.""" - return node + # Remember, according to the tuple sorting rules, + # + # (rev, anything,) < (rev+1,) < (rev+1, anything,) + predecessor_index = bisect.bisect(self.scores, (rev+1,)) - 1 - def get_best_revnum(self, node, preferred_revnum): - """Determine the best subversion revision number to use when - copying the source tree beginning at NODE. Returns a - subversion revision number. + if predecessor_index < 0: + # raise ValueError('Score for revision %s is unknown' % rev) + return -1 - PREFERRED_REVNUM is passed to best_rev and used to calculate the - best_revnum.""" + return self.scores[predecessor_index][1] - def score_revisions(svn_revision_ranges): - """Return a list of revisions and scores based on - SVN_REVISION_RANGES. The returned list looks like: + def get_best_revnum(self): + """Find the revnum with the highest score. - [(REV1 SCORE1), (REV2 SCORE2), ...] + Return (revnum, score) for the revnum with the highest score. If + the highest score is shared by multiple revisions, select the + oldest revision.""" - where the tuples are sorted by revision number. - SVN_REVISION_RANGES is a list of SVNRevisionRange objects. + best_revnum = SVN_INVALID_REVNUM + best_score = 0 + for revnum, score in self.scores: + if score > best_score: + best_score = score + best_revnum = revnum + return best_revnum, best_score - For each svn revision that appears as either an opening_revnum - or closing_revnum for one of the svn_revision_ranges, output a - tuple indicating how many of the SVNRevisionRanges include that - svn_revision in its range. A score thus indicates that copying - the corresponding revision (or any following revision up to the - next revision in the list) of the object in question would yield - that many correct paths at or underneath the object. There may - be other paths underneath it which are not correct and would - need to be deleted or recopied; those can only be detected by - descending and examining their scores. - If OPENINGS is empty, return the empty list.""" +class FillSource: + """Representation of a fill source. - openings = [ x.opening_revnum - for x in svn_revision_ranges ] - closings = [ x.closing_revnum - for x in svn_revision_ranges - if x.closing_revnum is not None ] + A fill source is a directory (either trunk or a branches + subdirectory) that can be used as a source for a symbol, along with + the self-computed score for the source. FillSources can be + compared; the comparison is such that it sorts FillSources in + descending order by score (higher score implies smaller). - # First look for easy out. - if not openings: - return [] + These objects are used by the symbol filler in SVNRepositoryMirror.""" - # Create a list with both openings (which increment the total) - # and closings (which decrement the total): - things = [(rev,1) for rev in openings] + [(rev,-1) for rev in closings] - # Sort by revision number: - things.sort() - # Initialize output list with zeroth element of things. This - # element must exist, because it was already verified that - # openings is not empty. - scores = [ things[0] ] - total = scores[-1][1] - for (rev, change) in things[1:]: - total += change - if rev == scores[-1][0]: - # Same revision as last entry; modify last entry: - scores[-1] = (rev, total) - else: - # Previously-unseen revision; create new entry: - scores.append((rev, total)) - return scores + def __init__(self, symbol, prefix, node, preferred_source=None): + """Create a scored fill source with a prefix and a key.""" + + # The Symbol instance for the symbol to be filled: + self._symbol = symbol + + # The svn path that is the base of this source (e.g., + # 'project1/trunk' or 'project1/branches/BRANCH1'): + self.prefix = prefix - def best_rev(scores, preferred_rev): - """Return the revision with the highest score from SCORES, a list - returned by score_revisions(). When the maximum score is shared - by multiple revisions, the oldest revision is selected, unless - PREFERRED_REV is one of the possibilities, in which case, it is - selected.""" - - max_score = 0 - preferred_rev_score = -1 - rev = SVN_INVALID_REVNUM - if preferred_rev is None: - # Comparison order of different types is arbitrary. Do not - # expect None to compare less than int values below. - preferred_rev = SVN_INVALID_REVNUM - for revnum, count in scores: - if count > max_score: - max_score = count - rev = revnum - if revnum <= preferred_rev: - preferred_rev_score = count - if preferred_rev_score == max_score: - rev = preferred_rev - return rev, max_score + # The node in the _SymbolFillingGuide corresponding to the prefix + # path: + self.node = node - # Aggregate openings and closings from the rev tree - svn_revision_ranges = self._list_revnums(node) + # The source that we should prefer to use, or None if there is no + # preference: + self._preferred_source = preferred_source + + # SCORE is the score of this source; REVNUM is the revision number + # with the best score: + self.revnum, self.score = self._get_best_revnum() + + def _get_best_revnum(self): + """Determine the best subversion revision number to use when + copying the source tree beginning at this source. + + Return (revnum, score) for the best revision found. If + SELF._preferred_source is not None and its revision number is + among the revision numbers with the best scores, return it; + otherwise, return the oldest such revision.""" + + # Aggregate openings and closings from our rev tree + svn_revision_ranges = self._get_revision_ranges(self.node) # Score the lists - scores = score_revisions(svn_revision_ranges) + revision_scores = _RevisionScores(svn_revision_ranges) + + best_revnum, best_score = revision_scores.get_best_revnum() - revnum, max_score = best_rev(scores, preferred_revnum) + if self._preferred_source is not None \ + and revision_scores.get_score(self._preferred_source.revnum) \ + == best_score: + best_revnum = self._preferred_source.revnum - if revnum == SVN_INVALID_REVNUM: + if best_revnum == SVN_INVALID_REVNUM: raise FatalError( "failed to find a revision to copy from when copying %s" - % self.symbol.name) - return revnum, max_score + % self._symbol.name) + return best_revnum, best_score - def _list_revnums(self, node): - """Return a list of all the SVNRevisionRanges (including - duplicates) for all leaf nodes at and under NODE.""" + def _get_revision_ranges(self, node): + """Return a list of all the SVNRevisionRanges at and under NODE. + + Include duplicates. This is a helper method used by + _get_best_revnum().""" if isinstance(node, SVNRevisionRange): # It is a leaf node. return [ node ] else: # It is an intermediate node. - revnums = [] + revision_ranges = [] for key, subnode in node.items(): - revnums.extend(self._list_revnums(subnode)) - return revnums + revision_ranges.extend(self._get_revision_ranges(subnode)) + return revision_ranges + + def _get_subsource(self, node, preferred_source): + """Return the FillSource for the specified NODE.""" + + return FillSource(self._symbol, self.prefix, node, preferred_source) + + def get_subsources(self, preferred_source): + """Generate (entry, FillSource) for all direct subsources.""" + + if not isinstance(self.node, SVNRevisionRange): + for entry, node in self.node.items(): + yield entry, self._get_subsource(node, preferred_source) + + def __cmp__(self, other): + """Comparison operator that sorts FillSources in descending score order. + + If the scores are the same, prefer the source that is taken from + the same branch as its preferred_source; otherwise, prefer the one + that is on trunk. If all those are equal then use alphabetical + order by path (to stabilize testsuite results).""" + + trunk_path = self._symbol.project.trunk_path + return cmp(other.score, self.score) \ + or cmp(other._preferred_source is not None + and other.prefix == other._preferred_source.prefix, + self._preferred_source is not None + and self.prefix == self._preferred_source.prefix) \ + or cmp(other.prefix == trunk_path, self.prefix == trunk_path) \ + or cmp(self.prefix, other.prefix) + + +class FillSourceSet: + """A set of FillSources for a given symbol and path.""" + + def __init__(self, symbol, path, sources): + # The symbol that the sources are for: + self._symbol = symbol + + # The path, relative to the source base paths, that is being + # processed: + self.path = path + + # A list of sources, sorted in descending order of score. + self._sources = sources + self._sources.sort() + + def __nonzero__(self): + return bool(self._sources) + + def get_best_source(self): + return self._sources[0] + + def get_subsource_sets(self, preferred_source): + """Return a FillSourceSet for each subentry that still needs filling. + + The return value is a map {entry : FillSourceSet} for subentries + that need filling, where entry is a path element under the path + handled by SELF.""" + + source_entries = {} + for source in self._sources: + for entry, subsource in source.get_subsources(preferred_source): + source_entries.setdefault(entry, []).append(subsource) + + retval = {} + for (entry, source_list) in source_entries.items(): + retval[entry] = FillSourceSet( + self._symbol, path_join(self.path, entry), source_list + ) + + return retval + + +class _SymbolFillingGuide: + """A tree holding the sources that can be copied to fill a symbol. + + The class holds a node tree representing any parts of the svn + directory structure that can be used to incrementally fill the + symbol in the current SVNCommit. The directory nodes in the tree + are dictionaries mapping pathname components to subnodes. A leaf + node exists for any potential source that has had an opening since + the last fill of this symbol, and thus can be filled in this commit. + The leaves themselves are SVNRevisionRange objects telling for what + range of revisions the leaf could serve as a source. + + self._node_tree is the root node of the directory tree. By walking + self._node_tree and calling self._get_best_revnum() on each node, + the caller can determine what subversion revision number to copy the + path corresponding to that node from. self._node_tree should be + treated as read-only. + + The caller can then descend to sub-nodes to see if their 'best + revnum' differs from their parent's and if it does, take appropriate + actions to 'patch up' the subtrees.""" - def get_sources(self): - """Return the list of sources for this symbolic name. + def __init__(self, symbol, openings_closings_map): + """Initializes a _SymbolFillingGuide for SYMBOL. - The Project instance defines what are legitimate sources. Raise - an exception if a change occurred outside of the source - directories.""" + SYMBOL is either a Branch or a Tag. Record the openings and + closings from OPENINGS_CLOSINGS_MAP, which is a map {svn_path : + SVNRevisionRange} containing the openings and closings for + svn_paths.""" - return self._get_sub_sources('', self._node_tree) + self.symbol = symbol + + # The dictionary that holds our root node as a map { + # path_component : node }. Subnodes are also dictionaries with + # the same form. + self._node_tree = { } + + for svn_path, svn_revision_range in openings_closings_map.items(): + (head, tail) = path_split(svn_path) + self._get_node_for_path(head)[tail] = svn_revision_range + + #self.print_node_tree(self._node_tree) + + def _get_node_for_path(self, svn_path): + """Return the node for svn_path, creating new nodes as needed.""" + + # Walk down the path, one node at a time. + node = self._node_tree + + for component in svn_path.split('/'): + node = node.setdefault(component, {}) + + return node + + def get_source_set(self): + """Return the list of FillSources for this symbolic name. + + The Project instance defines what are legitimate sources + (basically, the project's trunk or any directory directly under + its branches path). Return a list of FillSource objects, one for + each source that is present in the node tree. Raise an exception + if a change occurred outside of the source directories.""" + + return FillSourceSet( + self.symbol, '', list(self._get_sub_sources('', self._node_tree)) + ) def _get_sub_sources(self, start_svn_path, start_node): - """Return the list of sources for this symbolic name, starting the - search at path START_SVN_PATH, which is node START_NODE. This is - a helper method, called by get_sources() (see).""" + """Generate the sources within SVN_START_PATH. + + Start the search at path START_SVN_PATH, which is node START_NODE. + Generate a sequence of FillSource objects. + + This is a helper method, called by get_source_set() (see).""" if isinstance(start_node, SVNRevisionRange): # This implies that a change was found outside of the # legitimate sources. This should never happen. raise elif self.symbol.project.is_source(start_svn_path): - # This is a legitimate source. Add it to list. - return [ FillSource(self.symbol.project, start_svn_path, start_node) ] + # This is a legitimate source. Output it: + yield FillSource(self.symbol, start_svn_path, start_node) else: # This is a directory that is not a legitimate source. (That's - # OK because it hasn't changed directly.) But directories - # within it have been changed, so we need to search recursively - # to find their enclosing sources. - sources = [] + # OK because it hasn't changed directly.) But one or more + # directories within it have been changed, so we need to search + # recursively to find the sources enclosing them. for entry, node in start_node.items(): svn_path = path_join(start_svn_path, entry) - sources.extend(self._get_sub_sources(svn_path, node)) - - return sources + for source in self._get_sub_sources(svn_path, node): + yield source def print_node_tree(self, node, name='/', indent_depth=0): - """For debugging purposes. Prints all nodes in TREE that are - rooted at NODE. INDENT_DEPTH is used to indent the output of - recursive calls.""" + """Print all nodes in TREE that are rooted at NODE to sys.stdout. + + INDENT_DEPTH is used to indent the output of recursive calls. + This method is included for debugging purposes.""" if not indent_depth: print "TREE", "=" * 75 @@ -237,3 +383,7 @@ class SymbolFillingGuide: self.print_node_tree(value, key, (indent_depth + 1)) +def get_source_set(symbol, openings_closings_map): + return _SymbolFillingGuide(symbol, openings_closings_map).get_source_set() + + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/symbol_statistics.py cvs2svn-2.0.0/cvs2svn_lib/symbol_statistics.py --- cvs2svn-1.5.x/cvs2svn_lib/symbol_statistics.py 2006-08-20 14:12:27.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/symbol_statistics.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -14,7 +14,7 @@ # history and logs, available at http://cvs2svn.tigris.org/. # ==================================================================== -"""This module gathers and processes statistics about CVS symbols.""" +"""This module gathers and processes statistics about lines of development.""" import sys import cPickle @@ -23,52 +23,179 @@ from cvs2svn_lib.boolean import * from cvs2svn_lib.set_support import * from cvs2svn_lib import config from cvs2svn_lib.common import error_prefix +from cvs2svn_lib.common import InternalError from cvs2svn_lib.log import Log from cvs2svn_lib.artifact_manager import artifact_manager +from cvs2svn_lib.symbol import Trunk from cvs2svn_lib.symbol import Symbol -from cvs2svn_lib.symbol import TagSymbol +from cvs2svn_lib.symbol import Tag from cvs2svn_lib.symbol import ExcludedSymbol +from cvs2svn_lib.symbol import TypedSymbol +from cvs2svn_lib.cvs_item import CVSBranch +from cvs2svn_lib.cvs_item import CVSTag class _Stats: """A summary of information about a symbol (tag or branch). Members: - symbol -- the Symbol instance of the symbol being described - tag_create_count -- the number of files on which this symbol - appears as a tag + lod -- the LineOfDevelopment instance of the lod being described - branch_create_count -- the number of files on which this symbol + tag_create_count -- the number of files in which this lod appears + as a tag + + branch_create_count -- the number of files in which this lod appears as a branch - branch_commit_count -- the number of commits on this branch + branch_commit_count -- the number of files in which there were + commits on this lod branch_blockers -- a set of Symbol instances for any symbols that - sprout from a branch with this name.""" + sprout from a branch with this name. + + possible_parents -- a map {LineOfDevelopment : count} indicating + in how many files each LOD could have served as the parent of + self.lod.""" - def __init__(self, symbol): - self.symbol = symbol + def __init__(self, lod): + self.lod = lod self.tag_create_count = 0 self.branch_create_count = 0 self.branch_commit_count = 0 self.branch_blockers = set() + self.possible_parents = { } + + def register_tag_creation(self): + """Register the creation of this lod as a tag.""" + + self.tag_create_count += 1 + + def register_branch_creation(self): + """Register the creation of this lod as a branch.""" + + self.branch_create_count += 1 + + def register_branch_commit(self): + """Register that there were commit(s) on this branch in one file.""" + + self.branch_commit_count += 1 + + def register_branch_blocker(self, blocker): + """Register BLOCKER as preventing this symbol from being deleted. + + BLOCKER is a tag or a branch that springs from a revision on this + symbol.""" + + self.branch_blockers.add(blocker) + + def register_possible_parent(self, lod): + """Register that LOD was a possible parent for SELF.lod in a file.""" + + self.possible_parents[lod] = self.possible_parents.get(lod, 0) + 1 + + def register_branch_possible_parents(self, cvs_branch, cvs_file_items): + """Register any possible parents of this symbol from CVS_BRANCH.""" + + # This routine is a bottleneck. So we define some local variables + # to speed up access to frequently-needed variables. + register = self.register_possible_parent + parent_cvs_rev = cvs_file_items[cvs_branch.source_id] + + # The "obvious" parent of a branch is the branch holding the + # revision where the branch is rooted: + register(parent_cvs_rev.lod) + + # Any other branches that are rooted at the same revision and + # were committed earlier than the branch are also possible + # parents: + symbol = cvs_branch.symbol + for branch_id in parent_cvs_rev.branch_ids: + parent_symbol = cvs_file_items[branch_id].symbol + # A branch cannot be its own parent, nor can a branch's + # parent be a branch that was created after it. So we stop + # iterating when we reached the branch whose parents we are + # collecting: + if parent_symbol == symbol: + break + register(parent_symbol) + + def register_tag_possible_parents(self, cvs_tag, cvs_file_items): + """Register any possible parents of this symbol from CVS_TAG.""" + + # This routine is a bottleneck. So use local variables to speed + # up access to frequently-needed objects. + register = self.register_possible_parent + parent_cvs_rev = cvs_file_items[cvs_tag.source_id] + + # The "obvious" parent of a tag is the branch holding the + # revision where the branch is rooted: + register(parent_cvs_rev.lod) + + # Branches that are rooted at the same revision are also + # possible parents: + for branch_id in parent_cvs_rev.branch_ids: + parent_symbol = cvs_file_items[branch_id].symbol + register(parent_symbol) + + def is_ghost(self): + """Return True iff this lod never really existed.""" + + return ( + not isinstance(self.lod, Trunk) + and self.branch_commit_count == 0 + and not self.branch_blockers + and not self.possible_parents + ) + + def get_preferred_parents(self): + """Return the LinesOfDevelopment preferred as parents for this lod. + + Return the tuple (BEST_SYMBOLS, BEST_COUNT), where BEST_SYMBOLS is + the set of LinesOfDevelopment that appeared most often as possible + parents, and BEST_COUNT is the number of times those symbols + appeared. BEST_SYMBOLS might contain multiple symbols if multiple + LinesOfDevelopment have the same count.""" + + best_count = -1 + best_symbols = set() + for (symbol, count) in self.possible_parents.items(): + if count > best_count: + best_count = count + best_symbols.clear() + best_symbols.add(symbol) + elif count == best_count: + best_symbols.add(symbol) + + return (best_symbols, best_count) def __str__(self): return ( '\'%s\' is a tag in %d files, a branch in ' '%d files and has commits in %d files' - % (self.symbol, self.tag_create_count, + % (self.lod, self.tag_create_count, self.branch_create_count, self.branch_commit_count)) + def __repr__(self): + retval = ['%s; %d possible parents:\n' + % (self, len(self.possible_parents))] + parent_counts = self.possible_parents.items() + parent_counts.sort(lambda a,b: - cmp(a[1], b[1])) + for (symbol, count) in parent_counts: + if isinstance(symbol, Trunk): + retval.append(' trunk : %d\n' % count) + else: + retval.append(' \'%s\' : %d\n' % (symbol.name, count)) + return ''.join(retval) + class SymbolStatisticsCollector: - """Collect statistics about symbols. + """Collect statistics about lines of development. - Record a brief summary of information about each symbol in the RCS - files into a database. The database is created in CollectRevsPass - and it is used in CollateSymbolsPass (via the SymbolStatistics - class). + Record a summary of information about each line of development in + the RCS files for later storage into a database. The database is + created in CollectRevsPass and it is used in CollateSymbolsPass (via + the SymbolStatistics class). collect_data._SymbolDataCollector inserts information into instances of this class by by calling its register_*() methods. @@ -76,58 +203,78 @@ class SymbolStatisticsCollector: Its main purpose is to assist in the decisions about which symbols can be treated as branches and tags and which may be excluded. - The data collected by this class can be written to a text file - (config.SYMBOL_STATISTICS_LIST).""" + The data collected by this class can be written to the file + config.SYMBOL_STATISTICS.""" def __init__(self): - # A map { symbol -> record } for all symbols (branches and tags) + # A map { lod -> _Stats } for all lines of development: self._stats = { } - def _get_stats(self, symbol): - """Return the _Stats record for SYMBOL. + def __getitem__(self, lod): + """Return the _Stats record for line of development LOD. - Create a new one if necessary.""" + Create and register a new one if necessary.""" try: - return self._stats[symbol] + return self._stats[lod] except KeyError: - stats = _Stats(symbol) - self._stats[symbol] = stats + stats = _Stats(lod) + self._stats[lod] = stats return stats - def register_tag_creation(self, symbol): - """Register the creation of the tag SYMBOL.""" + def register(self, cvs_file_items): + """Register the possible parents for each symbol in CVS_FILE_ITEMS.""" + + for lod_items in cvs_file_items.iter_lods(): + if lod_items.lod is not None: + branch_stats = self[lod_items.lod] + + branch_stats.register_branch_creation() + + if lod_items.cvs_revisions: + branch_stats.register_branch_commit() + + for cvs_tag in lod_items.cvs_tags: + branch_stats.register_branch_blocker(cvs_tag.symbol) - self._get_stats(symbol).tag_create_count += 1 + for cvs_branch in lod_items.cvs_branches: + branch_stats.register_branch_blocker(cvs_branch.symbol) - def register_branch_creation(self, symbol): - """Register the creation of the branch SYMBOL.""" + if lod_items.cvs_branch is not None: + branch_stats.register_branch_possible_parents( + lod_items.cvs_branch, cvs_file_items) - self._get_stats(symbol).branch_create_count += 1 + for cvs_tag in lod_items.cvs_tags: + tag_stats = self[cvs_tag.symbol] - def register_branch_commit(self, symbol): - """Register a commit on the branch SYMBOL.""" + tag_stats.register_tag_creation() - self._get_stats(symbol).branch_commit_count += 1 + tag_stats.register_tag_possible_parents(cvs_tag, cvs_file_items) - def register_branch_blocker(self, symbol, blocker): - """Register BLOCKER as a blocker on the branch SYMBOL.""" + def purge_ghost_symbols(self): + """Purge any symbols that don't have any activity. - self._get_stats(symbol).branch_blockers.add(blocker) + Such ghost symbols can arise if a symbol was defined in an RCS + file but pointed at a non-existent revision.""" - def write(self): - """Store the stats database to the SYMBOL_STATISTICS_LIST file.""" + for stats in self._stats.values(): + if stats.is_ghost(): + Log().warn('Deleting ghost symbol: %s' % (stats.lod,)) + del self._stats[stats.lod] - f = open(artifact_manager.get_temp_file(config.SYMBOL_STATISTICS_LIST), - 'wb') + def close(self): + """Store the stats database to the SYMBOL_STATISTICS file.""" + + f = open(artifact_manager.get_temp_file(config.SYMBOL_STATISTICS), 'wb') cPickle.dump(self._stats.values(), f, -1) f.close() + self._stats = None class SymbolStatistics: - """Read and handle symbol statistics. + """Read and handle line of development statistics. - The symbol statistics are read from a database created by + The statistics are read from a database created by SymbolStatisticsCollector. This class has methods to process the statistics information and help with decisions about: @@ -144,32 +291,29 @@ class SymbolStatistics: - A non-excluded branch depends on an excluded branch - The data in this class is read from a pickle file - (config.SYMBOL_STATISTICS_LIST).""" - - def __init__(self): - """Read the stats database from the SYMBOL_STATISTICS_LIST file.""" + The data in this class is read from a pickle file.""" - # A hash that maps symbol names to _Stats instances - self._stats_by_name = { } + def __init__(self, filename): + """Read the stats database from FILENAME.""" - # A map { Symbol -> _Stats } for all symbols (branches and tags) + # A map { LineOfDevelopment -> _Stats } for all lines of + # development: self._stats = { } - stats_list = cPickle.load(open(artifact_manager.get_temp_file( - config.SYMBOL_STATISTICS_LIST), 'rb')) + stats_list = cPickle.load(open(filename, 'rb')) for stats in stats_list: - symbol = stats.symbol - self._stats_by_name[symbol.name] = stats - self._stats[symbol] = stats + self._stats[stats.lod] = stats - def get_stats(self, name): - """Return the _Stats object for the symbol named NAME. + def __len__(self): + return len(self._stats) - Raise KeyError if no such name exists.""" + def get_stats(self, lod): + """Return the _Stats object for LineOfDevelopment instance LOD. - return self._stats_by_name[name] + Raise KeyError if no such lod exists.""" + + return self._stats[lod] def __iter__(self): return self._stats.itervalues() @@ -178,24 +322,29 @@ class SymbolStatistics: """Find all excluded symbols that are blocked by non-excluded symbols. Non-excluded symbols are by definition the symbols contained in - SYMBOLS, which is a map { name : Symbol }. Return a map { name : - blocker_names } containing any problems found, where blocker_names - is a set containing the names of blocking symbols.""" + SYMBOLS, which is a map { name : Symbol } not including Trunk + entries. Return a map { name : blocker_names } containing any + problems found, where blocker_names is a set containing the names + of blocking symbols.""" blocked_branches = {} for stats in self: - if stats.symbol.name not in symbols: + if isinstance(stats.lod, Trunk): + # Trunk is never excluded + pass + elif stats.lod.name not in symbols: blockers = [ blocker.name for blocker in stats.branch_blockers if blocker.name in symbols ] if blockers: - blocked_branches[stats.symbol.name] = set(blockers) + blocked_branches[stats.lod.name] = set(blockers) return blocked_branches def _check_blocked_excludes(self, symbols): """Check whether any excluded branches are blocked. - A branch can be blocked because it has another, non-excluded - symbol that depends on it. If any blocked excludes are found, + SYMBOLS is a map { name : Symbol } not including Trunk entries. A + branch can be blocked because it has another, non-excluded symbol + that depends on it. If any blocked excludes are found in SYMBOLS, output error messages describing the situation. Return True if any errors were found.""" @@ -217,18 +366,19 @@ class SymbolStatistics: def _check_invalid_tags(self, symbols): """Check for commits on any symbols that are to be converted as tags. - In that case, they can't be converted as tags. If any invalid - tags are found, output error messages describing the problems. - Return True iff any errors are found.""" + SYMBOLS is a map { name : Symbol } not including Trunk entries. + If there is a commit on a symbol, then it cannot be converted as a + tag. If any tags with commits are found, output error messages + describing the problems. Return True iff any errors are found.""" Log().quiet("Checking for forced tags with commits...") invalid_tags = [ ] for symbol in symbols.values(): - if isinstance(symbol, TagSymbol): - stats = self.get_stats(symbol.name) + if isinstance(symbol, Tag): + stats = self.get_stats(symbol) if stats.branch_commit_count > 0: - invalid_tags.append(stats.symbol.name) + invalid_tags.append(stats.lod.name) if not invalid_tags: # No problems found: @@ -242,19 +392,48 @@ class SymbolStatistics: return True - def check_consistency(self, symbols): + def check_consistency(self, lods): """Check the plan for how to convert symbols for consistency. - SYMBOLS is an iterable of TypedSymbol objects indicating how each - symbol is to be converted. Return True iff any problems were - detected.""" + LODS is an iterable of Trunk and TypedSymbol objects indicating + how each line of development is to be converted. Return True iff + any problems were detected.""" + + # Keep track of which symbols have not yet been processed: + unprocessed_lods = set(self._stats.keys()) # Create a map { symbol_name : Symbol } including only # non-excluded symbols: symbols_by_name = {} - for symbol in symbols: - if not isinstance(symbol, ExcludedSymbol): - symbols_by_name[symbol.name] = symbol + for lod in lods: + try: + unprocessed_lods.remove(lod) + except KeyError: + if lod in self._stats: + raise InternalError( + 'Symbol %s appeared twice in the symbol conversion table' + % (lod,)) + else: + raise InternalError('Symbol %s is unknown' % (lod,)) + + if isinstance(lod, Trunk): + # Trunk is not processed any further. + pass + elif isinstance(lod, ExcludedSymbol): + # Symbol excluded; don't process it any further. + pass + elif isinstance(lod, TypedSymbol): + # This is an included symbol. Include it in the symbol check. + symbols_by_name[lod.name] = lod + else: + raise InternalError('Symbol %s is of unexpected type' % (lod,)) + + # Make sure that all symbols were processed: + if unprocessed_lods: + raise InternalError( + 'The following symbols did not appear in the symbol conversion ' + 'table: %s' + % (', '.join([str(s) for s in unprocessed_lods]),)) # It is important that we not short-circuit here: return ( @@ -262,4 +441,40 @@ class SymbolStatistics: | self._check_invalid_tags(symbols_by_name) ) + def exclude_symbol(self, symbol): + """SYMBOL has been excluded; remove it from our statistics.""" + + del self._stats[symbol] + + # Remove references to this symbol from other statistics objects: + for stats in self._stats.itervalues(): + stats.branch_blockers.discard(symbol) + if symbol in stats.possible_parents: + del stats.possible_parents[symbol] + + def get_preferred_parents(self): + """Return the LinesOfDevelopment preferred as parents for each symbol. + + Return a map {Symbol : LineOfDevelopment} giving the LOD that + appears most often as a possible parent for each symbol. Do not + include entries for Trunk objects. If a symbol has no possible + parents because it never exists as a CVSBranch or a CVSTag, then + the associated value is None.""" + + retval = {} + for stats in self._stats.itervalues(): + if isinstance(stats.lod, Trunk): + # Trunk entries don't have any parents. + pass + else: + (parents, count) = stats.get_preferred_parents() + if not parents: + retval[stats.lod] = None + else: + parents = list(parents) + parents.sort() + retval[stats.lod] = parents[0] + + return retval + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/symbol_strategy.py cvs2svn-2.0.0/cvs2svn_lib/symbol_strategy.py --- cvs2svn-1.5.x/cvs2svn_lib/symbol_strategy.py 2006-08-27 19:16:09.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/symbol_strategy.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -24,8 +24,9 @@ from cvs2svn_lib.set_support import * from cvs2svn_lib.common import FatalError from cvs2svn_lib.common import error_prefix from cvs2svn_lib.log import Log -from cvs2svn_lib.symbol import BranchSymbol -from cvs2svn_lib.symbol import TagSymbol +from cvs2svn_lib.symbol import Trunk +from cvs2svn_lib.symbol import Branch +from cvs2svn_lib.symbol import Tag from cvs2svn_lib.symbol import ExcludedSymbol @@ -36,9 +37,9 @@ class StrategyRule: """Return an object describing what to do with the symbol in STATS. If this rule applies to the symbol whose statistics are collected - in STATS, then return an object of type BranchSymbol, TagSymbol, - or ExcludedSymbol as appropriate. If this rule doesn't apply, - return None.""" + in STATS, then return an object of type Branch, Tag, or + ExcludedSymbol as appropriate. If this rule doesn't apply, return + None.""" raise NotImplementedError @@ -57,8 +58,8 @@ class _RegexpStrategyRule(StrategyRule): it is anchored at the beginning and end of the symbol name). ACTION is the class representing how the symbol should be - converted. It should be one of the classes BranchSymbol, - TagSymbol, or ExcludedSymbol. + converted. It should be one of the classes Branch, Tag, or + ExcludedSymbol. If PATTERN matches a symbol name, then get_symbol() returns ACTION(name, id); otherwise it returns None.""" @@ -71,8 +72,8 @@ class _RegexpStrategyRule(StrategyRule): self.action = action def get_symbol(self, stats): - if self.regexp.match(stats.symbol.name): - return self.action(stats.symbol) + if self.regexp.match(stats.lod.name): + return self.action(stats.lod) else: return None @@ -81,14 +82,14 @@ class ForceBranchRegexpStrategyRule(_Reg """Force symbols matching pattern to be branches.""" def __init__(self, pattern): - _RegexpStrategyRule.__init__(self, pattern, BranchSymbol) + _RegexpStrategyRule.__init__(self, pattern, Branch) class ForceTagRegexpStrategyRule(_RegexpStrategyRule): """Force symbols matching pattern to be tags.""" def __init__(self, pattern): - _RegexpStrategyRule.__init__(self, pattern, TagSymbol) + _RegexpStrategyRule.__init__(self, pattern, Tag) class ExcludeRegexpStrategyRule(_RegexpStrategyRule): @@ -108,9 +109,9 @@ class UnambiguousUsageRule(StrategyRule) # Can't decide return None elif is_branch: - return BranchSymbol(stats.symbol) + return Branch(stats.lod) elif is_tag: - return TagSymbol(stats.symbol) + return Tag(stats.lod) else: # The symbol didn't appear at all: return None @@ -121,7 +122,7 @@ class BranchIfCommitsRule(StrategyRule): def get_symbol(self, stats): if stats.branch_commit_count > 0: - return BranchSymbol(stats.symbol) + return Branch(stats.lod) else: return None @@ -134,9 +135,9 @@ class HeuristicStrategyRule(StrategyRule def get_symbol(self, stats): if stats.tag_create_count >= stats.branch_create_count: - return TagSymbol(stats.symbol) + return Tag(stats.lod) else: - return BranchSymbol(stats.symbol) + return Branch(stats.lod) class AllBranchRule(StrategyRule): @@ -147,7 +148,7 @@ class AllBranchRule(StrategyRule): therefore only apply to the symbols not handled earlier.""" def get_symbol(self, stats): - return BranchSymbol(stats.symbol) + return Branch(stats.lod) class AllTagRule(StrategyRule): @@ -161,18 +162,22 @@ class AllTagRule(StrategyRule): therefore only apply to the symbols not handled earlier.""" def get_symbol(self, stats): - return TagSymbol(stats.symbol) + return Tag(stats.lod) class SymbolStrategy: """A strategy class, used to decide how to convert CVS symbols.""" def get_symbols(self, symbol_stats): - """Return an iterable of symbols to convert. + """Return a list of TypedSymbol objects telling how to convert symbols. - The values returned by the iterable are BranchSymbol, TagSymbol, - or ExcludedSymbol objects, indicating how the symbol should be - converted. Return None if there was an error.""" + The list values are TypedSymbol objects (Branch, Tag, or + ExcludedSymbol), indicating how each symbol should be converted. + Trunk objects in SYMBOL_STATS should be passed through unchanged. + One object must be included in the return value for each line of + development described in SYMBOL_STATS. + + Return None if there was an error.""" raise NotImplementedError @@ -196,6 +201,9 @@ class RuleBasedSymbolStrategy: self._rules.append(rule) def _get_symbol(self, stats): + if isinstance(stats.lod, Trunk): + return stats.lod + else: for rule in self._rules: symbol = rule.get_symbol(stats) if symbol is not None: @@ -218,8 +226,9 @@ class RuleBasedSymbolStrategy: sys.stderr.write( error_prefix + ": It is not clear how the following symbols " "should be converted.\n" - "Use --force-tag, --force-branch and/or --exclude to resolve the " - "ambiguity.\n") + "Use --force-tag, --force-branch, --exclude, and/or " + "--symbol-default to\n" + "resolve the ambiguity.\n") for stats in mismatches: sys.stderr.write(" %s\n" % stats) return None diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/symbol_transform.py cvs2svn-2.0.0/cvs2svn_lib/symbol_transform.py --- cvs2svn-1.5.x/cvs2svn_lib/symbol_transform.py 2006-09-25 16:56:22.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/symbol_transform.py 2007-08-15 22:53:53.000000000 +0200 @@ -1,7 +1,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2006 CollabNet. All rights reserved. +# Copyright (c) 2006-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -33,7 +33,7 @@ class SymbolTransform: Return the transformed symbol name. If this SymbolTransform doesn't apply, return the original SYMBOL_NAME. - This method is free to use the information inf CVS_FILE (including + This method is free to use the information in CVS_FILE (including CVS_FILE.project) to decide whether and/or how to transform SYMBOL_NAME.""" @@ -41,8 +41,18 @@ class SymbolTransform: class RegexpSymbolTransform(SymbolTransform): + """Transform symbols by using a regexp textual substitution.""" + def __init__(self, pattern, replacement): - self.pattern = re.compile(pattern) + """Create a SymbolTransform that transforms symbols matching PATTERN. + + PATTERN is a regular expression that should match the whole symbol + name. REPLACEMENT is the replacement text, which may include + patterns like r'\1' or r'\g<1>' or r'\g<name>' (where 'name' is a + reference to a named substring in the pattern of the form + r'(?P<name>...)').""" + + self.pattern = re.compile('^' + pattern + '$') self.replacement = replacement def transform(self, cvs_file, symbol_name): diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/symbolings_reader.py cvs2svn-2.0.0/cvs2svn_lib/symbolings_reader.py --- cvs2svn-1.5.x/cvs2svn_lib/symbolings_reader.py 2006-08-20 13:36:35.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_lib/symbolings_reader.py 1970-01-01 01:00:00.000000000 +0100 @@ -1,101 +0,0 @@ -# (Be in -*- python -*- mode.) -# -# ==================================================================== -# Copyright (c) 2000-2006 CollabNet. All rights reserved. -# -# This software is licensed as described in the file COPYING, which -# you should have received as part of this distribution. The terms -# are also available at http://subversion.tigris.org/license-1.html. -# If newer versions of this license are posted there, you may use a -# newer version instead, at your option. -# -# This software consists of voluntary contributions made by many -# individuals. For exact contribution history, see the revision -# history and logs, available at http://cvs2svn.tigris.org/. -# ==================================================================== - -"""This module contains database facilities used by cvs2svn.""" - -import cPickle - -from cvs2svn_lib.boolean import * -from cvs2svn_lib import config -from cvs2svn_lib.context import Ctx -from cvs2svn_lib.artifact_manager import artifact_manager -from cvs2svn_lib.database import Database -from cvs2svn_lib.database import DB_OPEN_READ -from cvs2svn_lib.openings_closings import OpeningsClosingsMap -from cvs2svn_lib.symbol_filling_guide import SymbolFillingGuide - - -class SymbolingsReader: - """Provides an interface to the SYMBOL_OPENINGS_CLOSINGS_SORTED file - and the SYMBOL_OFFSETS_DB. Does the heavy lifting of finding and - returning the correct opening and closing Subversion revision - numbers for a given symbolic name.""" - - def __init__(self): - """Opens the SYMBOL_OPENINGS_CLOSINGS_SORTED for reading, and - reads the offsets database into memory.""" - - self.symbolings = open( - artifact_manager.get_temp_file( - config.SYMBOL_OPENINGS_CLOSINGS_SORTED), - 'r') - # The offsets_db is really small, and we need to read and write - # from it a fair bit, so suck it into memory - offsets_db = file( - artifact_manager.get_temp_file(config.SYMBOL_OFFSETS_DB), 'rb') - # A map from symbol_id to offset. - self.offsets = cPickle.load(offsets_db) - offsets_db.close() - - def filling_guide_for_symbol(self, symbol, svn_revnum): - """Given SYMBOL and SVN_REVNUM, return a new SymbolFillingGuide object. - - SYMBOL is a TypedSymbol instance. Note that if we encounter an - opening rev in this fill, but the corresponding closing rev takes - place later than SVN_REVNUM, the closing will not be passed to - SymbolFillingGuide in this fill (and will be discarded when - encountered in a later fill). This is perfectly fine, because we - can still do a valid fill without the closing--we always try to - fill what we can as soon as we can.""" - - openings_closings_map = OpeningsClosingsMap(symbol) - - # It's possible to have a branch start with a file that was added - # on a branch - if symbol.id in self.offsets: - # Set our read offset for self.symbolings to the offset for this - # symbol: - self.symbolings.seek(self.offsets[symbol.id]) - - while True: - fpos = self.symbolings.tell() - line = self.symbolings.readline().rstrip() - if not line: - break - id, revnum, type, branch_id, cvs_file_id = line.split() - id = int(id, 16) - cvs_file_id = int(cvs_file_id, 16) - cvs_file = Ctx()._cvs_file_db.get_file(cvs_file_id) - if branch_id == '*': - svn_path = cvs_file.project.make_trunk_path(cvs_file.cvs_path) - else: - branch_id = int(branch_id, 16) - svn_path = cvs_file.project.make_branch_path( - Ctx()._symbol_db.get_symbol(branch_id), cvs_file.cvs_path) - revnum = int(revnum) - if revnum > svn_revnum or id != symbol.id: - break - openings_closings_map.register(svn_path, revnum, type) - - # get current offset of the read marker and set it to the offset - # for the beginning of the line we just read if we used anything - # we read. - if not openings_closings_map.is_empty(): - self.offsets[symbol.id] = fpos - - return SymbolFillingGuide(openings_closings_map) - - diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/time_range.py cvs2svn-2.0.0/cvs2svn_lib/time_range.py --- cvs2svn-1.5.x/cvs2svn_lib/time_range.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/time_range.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,47 @@ +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2006 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +"""This module contains a class to manage time ranges.""" + + +from __future__ import generators + +from cvs2svn_lib.boolean import * + + +class TimeRange: + def __init__(self): + # Start out with a t_min higher than any incoming time T, and a + # t_max lower than any incoming T. This way the first T will push + # t_min down to T, and t_max up to T, naturally (without any + # special-casing), and successive times will then ratchet them + # outward as appropriate. + self.t_min = 1L<<32 + self.t_max = 0 + + def add(self, timestamp): + """Expand the range to encompass TIMESTAMP.""" + + if timestamp < self.t_min: + self.t_min = timestamp + if timestamp > self.t_max: + self.t_max = timestamp + + def __cmp__(self, other): + # Sorted by t_max, and break ties using t_min. + return cmp(self.t_max, other.t_max) or cmp(self.t_min, other.t_min) + + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_lib/version.py cvs2svn-2.0.0/cvs2svn_lib/version.py --- cvs2svn-1.5.x/cvs2svn_lib/version.py 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_lib/version.py 2007-08-15 22:53:53.000000000 +0200 @@ -0,0 +1,27 @@ +#!/usr/bin/env python +# (Be in -*- python -*- mode.) +# +# ==================================================================== +# Copyright (c) 2007 CollabNet. All rights reserved. +# +# This software is licensed as described in the file COPYING, which +# you should have received as part of this distribution. The terms +# are also available at http://subversion.tigris.org/license-1.html. +# If newer versions of this license are posted there, you may use a +# newer version instead, at your option. +# +# This software consists of voluntary contributions made by many +# individuals. For exact contribution history, see the revision +# history and logs, available at http://cvs2svn.tigris.org/. +# ==================================================================== + +# The version of cvs2svn: +VERSION = '2.0.0' + + +# If this file is run as a script, print the cvs2svn version number to +# stdout: +if __name__ == '__main__': + print VERSION + + diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_rcsparse/README.cvs2svn cvs2svn-2.0.0/cvs2svn_rcsparse/README.cvs2svn --- cvs2svn-1.5.x/cvs2svn_rcsparse/README.cvs2svn 2005-12-11 01:40:10.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_rcsparse/README.cvs2svn 1970-01-01 01:00:00.000000000 +0100 @@ -1,16 +0,0 @@ -This directory provides the 'rcsparse' library that cvs2svn.py depends on. - -Although rcsparse ships with ViewVC, http://viewvc.tigris.org/, -we don't want to just use the whatever rcsparse we find installed on -the system -- such as one that's part of a local ViewVC installation. -Instead, we want to use the exact version of rcsparse that is tested & -known to work with this version of cvs2svn. - -Hence this directory. To upgrade to a newer rcsparse, we just run -'update.sh', which fetches the current trunk versions using -'svn export'. - -We control when these upgrades happen, and install this library under -the name 'cvs2svn_rcsparse', so it won't conflict with any 'rcsparse' -already on the system; cvs2svn is careful to import it as -'cvs2svn_rcsparse'. diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_rcsparse/common.py cvs2svn-2.0.0/cvs2svn_rcsparse/common.py --- cvs2svn-1.5.x/cvs2svn_rcsparse/common.py 2006-05-27 01:25:16.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_rcsparse/common.py 2007-08-15 22:53:54.000000000 +0200 @@ -12,13 +12,9 @@ """common.py: common classes and functions for the RCS parsing tools.""" -import time +import calendar import string -### compat isn't in vclib right now. need to work up a solution -import compat - - class Sink: def set_head_revision(self, revision): pass @@ -185,7 +181,7 @@ class _Parser: if date_fields[0] < EPOCH: raise ValueError, 'invalid year' - timestamp = compat.timegm(tuple(date_fields)) + timestamp = calendar.timegm(tuple(date_fields)) # Parse author ### NOTE: authors containing whitespace are violations of the diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_rcsparse/compat.py cvs2svn-2.0.0/cvs2svn_rcsparse/compat.py --- cvs2svn-1.5.x/cvs2svn_rcsparse/compat.py 2006-05-25 17:41:45.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_rcsparse/compat.py 1970-01-01 01:00:00.000000000 +0100 @@ -1,166 +0,0 @@ -# -*-python-*- -# -# Copyright (C) 1999-2006 The ViewCVS Group. All Rights Reserved. -# -# By using this file, you agree to the terms and conditions set forth in -# the LICENSE.html file which can be found at the top level of the ViewVC -# distribution or at http://viewvc.org/license-1.html. -# -# For more information, visit http://viewvc.org/ -# -# ----------------------------------------------------------------------- -# -# compat.py: compatibility functions for operation across Python 1.5.x to 2.2.x -# -# ----------------------------------------------------------------------- - -import urllib -import string -import time -import calendar -import re -import os -import rfc822 -import tempfile -import errno - -# -# urllib.urlencode() is new to Python 1.5.2 -# -try: - urlencode = urllib.urlencode -except AttributeError: - def urlencode(dict): - "Encode a dictionary as application/x-url-form-encoded." - if not dict: - return '' - quote = urllib.quote_plus - keyvalue = [ ] - for key, value in dict.items(): - keyvalue.append(quote(key) + '=' + quote(str(value))) - return string.join(keyvalue, '&') - -# -# time.strptime() is new to Python 1.5.2 -# -if hasattr(time, 'strptime'): - def cvs_strptime(timestr): - 'Parse a CVS-style date/time value.' - return time.strptime(timestr, '%Y/%m/%d %H:%M:%S')[:-1] + (0,) -else: - _re_rev_date = re.compile('([0-9]{4})/([0-9][0-9])/([0-9][0-9]) ' - '([0-9][0-9]):([0-9][0-9]):([0-9][0-9])') - def cvs_strptime(timestr): - 'Parse a CVS-style date/time value.' - match = _re_rev_date.match(timestr) - if match: - return tuple(map(int, match.groups())) + (0, 1, 0) - else: - raise ValueError('date is not in cvs format') - -# -# os.makedirs() is new to Python 1.5.2 -# -try: - makedirs = os.makedirs -except AttributeError: - def makedirs(path, mode=0777): - head, tail = os.path.split(path) - if head and tail and not os.path.exists(head): - makedirs(head, mode) - os.mkdir(path, mode) - -# -# rfc822.formatdate() is new to Python 1.6 -# -try: - formatdate = rfc822.formatdate -except AttributeError: - def formatdate(timeval): - if timeval is None: - timeval = time.time() - timeval = time.gmtime(timeval) - return "%s, %02d %s %04d %02d:%02d:%02d GMT" % ( - ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"][timeval[6]], - timeval[2], - ["Jan", "Feb", "Mar", "Apr", "May", "Jun", - "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"][timeval[1]-1], - timeval[0], timeval[3], timeval[4], timeval[5]) - -# -# calendar.timegm() is new to Python 2.x and -# calendar.leapdays() was wrong in Python 1.5.2 -# -try: - timegm = calendar.timegm -except AttributeError: - def leapdays(year1, year2): - """Return number of leap years in range [year1, year2). - Assume year1 <= year2.""" - year1 = year1 - 1 - year2 = year2 - 1 - return (year2/4 - year1/4) - (year2/100 - - year1/100) + (year2/400 - year1/400) - - EPOCH = 1970 - def timegm(tuple): - """Unrelated but handy function to calculate Unix timestamp from GMT.""" - year, month, day, hour, minute, second = tuple[:6] - # assert year >= EPOCH - # assert 1 <= month <= 12 - days = 365*(year-EPOCH) + leapdays(EPOCH, year) - for i in range(1, month): - days = days + calendar.mdays[i] - if month > 2 and calendar.isleap(year): - days = days + 1 - days = days + day - 1 - hours = days*24 + hour - minutes = hours*60 + minute - seconds = minutes*60 + second - return seconds - -# -# tempfile.mkdtemp() is new to Python 2.3 -# -try: - mkdtemp = tempfile.mkdtemp -except AttributeError: - def mkdtemp(): - for i in range(10): - dir = tempfile.mktemp() - try: - os.mkdir(dir, 0700) - return dir - except OSError, e: - if e.errno == errno.EEXIST: - continue # try again - raise - - raise IOError, (errno.EEXIST, "No usable temporary directory name found") - -# -# the following stuff is *ONLY* needed for standalone.py. -# For that reason I've encapsulated it into a function. -# - -def for_standalone(): - import SocketServer - if not hasattr(SocketServer.TCPServer, "close_request"): - # - # method close_request() was missing until Python 2.1 - # - class TCPServer(SocketServer.TCPServer): - def process_request(self, request, client_address): - """Call finish_request. - - Overridden by ForkingMixIn and ThreadingMixIn. - - """ - self.finish_request(request, client_address) - self.close_request(request) - - def close_request(self, request): - """Called to clean up an individual request.""" - request.close() - - SocketServer.TCPServer = TCPServer diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_rcsparse/texttools.py cvs2svn-2.0.0/cvs2svn_rcsparse/texttools.py --- cvs2svn-1.5.x/cvs2svn_rcsparse/texttools.py 2006-05-25 17:41:45.000000000 +0200 +++ cvs2svn-2.0.0/cvs2svn_rcsparse/texttools.py 2007-08-15 22:53:54.000000000 +0200 @@ -25,7 +25,7 @@ _tt = TextTools _idchar_list = map(chr, range(33, 127)) + map(chr, range(160, 256)) _idchar_list.remove('$') _idchar_list.remove(',') -#_idchar_list.remove('.') leave as part of 'num' symbol +#_idchar_list.remove('.') # leave as part of 'num' symbol _idchar_list.remove(':') _idchar_list.remove(';') _idchar_list.remove('@') @@ -83,40 +83,84 @@ class _mxTokenStream: # construct a tag table which refers to the buffer we need to parse. table = ( - # ignore whitespace. with or without whitespace, move to the next rule. + #1: ignore whitespace. with or without whitespace, move to the next rule. (None, _tt.AllInSet, _tt.whitespace_set, +1), + #2 (_E_COMPLETE, _tt.EOF + _tt.AppendTagobj, _tt.Here, +1, _SUCCESS), - # accumulate token text and exit, or move to the next rule. + #3: accumulate token text and exit, or move to the next rule. (_UNUSED, _tt.AllInSet + _tt.AppendMatch, _idchar_set, +2), + #4 (_E_TOKEN, _tt.EOF + _tt.AppendTagobj, _tt.Here, -3, _SUCCESS), - # single character tokens exit immediately, or move to the next rule + #5: single character tokens exit immediately, or move to the next rule (_UNUSED, _tt.IsInSet + _tt.AppendMatch, _onechar_token_set, +2), + #6 (_E_COMPLETE, _tt.EOF + _tt.AppendTagobj, _tt.Here, -5, _SUCCESS), - # if this isn't an '@' symbol, then we have a syntax error (go to a + #7: if this isn't an '@' symbol, then we have a syntax error (go to a # negative index to indicate that condition). otherwise, suck it up # and move to the next rule. (_T_STRING_START, _tt.Is + _tt.AppendTagobj, '@'), + #8 (None, _tt.Is, '@', +4, +1), + #9 (buf, _tt.Is, '@', +1, -1), + #10 (_T_STRING_END, _tt.Skip + _tt.AppendTagobj, 0, 0, +1), + #11 (_E_STRING_END, _tt.EOF + _tt.AppendTagobj, _tt.Here, -10, _SUCCESS), + #12 (_E_STRING_SPAN, _tt.EOF + _tt.AppendTagobj, _tt.Here, +1, _SUCCESS), - # suck up everything that isn't an AT. go to next rule to look for EOF + #13: suck up everything that isn't an AT. go to next rule to look for EOF (buf, _tt.AllInSet, _not_at_set, 0, +1), - # go back to look for double AT if we aren't at the end of the string + #14: go back to look for double AT if we aren't at the end of the string (_E_STRING_SPAN, _tt.EOF + _tt.AppendTagobj, _tt.Here, -6, _SUCCESS), ) + # Fast, texttools may be, but it's somewhat lacking in clarity. + # Here's an attempt to document the logic encoded in the table above: + # + # Flowchart: + # _____ + # / /\ + # 1 -> 2 -> 3 -> 5 -> 7 -> 8 -> 9 -> 10 -> 11 + # | \/ \/ \/ /\ \/ + # \ 4 6 12 14 / + # \_______/_____/ \ / / + # \ 13 / + # \__________________________________________/ + # + # #1: Skip over any whitespace. + # #2: If now EOF, exit with code _E_COMPLETE. + # #3: If we have a series of characters in _idchar_set, then: + # #4: Output them as a token, and go back to #1. + # #5: If we have a character in _onechar_token_set, then: + # #6: Output it as a token, and go back to #1. + # #7: If we do not have an '@', then error. + # If we do, then log a _T_STRING_START and continue. + # #8: If we have another '@', continue on to #9. Otherwise: + # #12: If now EOF, exit with code _E_STRING_SPAN. + # #13: Record the slice up to the next '@' (or EOF). + # #14: If now EOF, exit with code _E_STRING_SPAN. + # Otherwise, go back to #8. + # #9: If we have another '@', then we've just seen an escaped + # (by doubling) '@' within an @-string. Record a slice including + # just one '@' character, and jump back to #8. + # Otherwise, we've *either* seen the terminating '@' of an @-string, + # *or* we've seen one half of an escaped @@ sequence that just + # happened to be split over a chunk boundary - in either case, + # we continue on to #10. + # #10: Log a _T_STRING_END. + # #11: If now EOF, exit with _E_STRING_END. Otherwise, go back to #1. + success, taglist, idx = _tt.tag(buf, table, start) if not success: diff -purNbBwx .svn cvs2svn-1.5.x/cvs2svn_rcsparse/update.sh cvs2svn-2.0.0/cvs2svn_rcsparse/update.sh --- cvs2svn-1.5.x/cvs2svn_rcsparse/update.sh 2005-12-11 01:40:10.000000000 +0100 +++ cvs2svn-2.0.0/cvs2svn_rcsparse/update.sh 1970-01-01 01:00:00.000000000 +0100 @@ -1,12 +0,0 @@ -#!/bin/sh - -set -ex - -VIEWVC_REPOS="http://viewvc.tigris.org/svn/viewvc" - -# Update the rcsparse library from ViewVC -svn export --force "$VIEWVC_REPOS/trunk/lib/vclib/ccvs/rcsparse" . - -# Now update compat.py (which is not in this directory, in the upstream source) -svn export "$VIEWVC_REPOS/trunk/lib/compat.py" - diff -purNbBwx .svn cvs2svn-1.5.x/design-notes.txt cvs2svn-2.0.0/design-notes.txt --- cvs2svn-1.5.x/design-notes.txt 2006-09-16 22:07:20.000000000 +0200 +++ cvs2svn-2.0.0/design-notes.txt 1970-01-01 01:00:00.000000000 +0100 @@ -1,550 +0,0 @@ - How cvs2svn Works - ================= - -A cvs2svn run consists of eight passes. Each pass saves the data it -produces to files on disk, so that a) we don't hold huge amounts of -state in memory, and b) the conversion process is resumable. - -CollectRevsPass (formerly called pass1) -=============== - -The goal of this pass is to write a summary of each CVS file as a -pickled CVSFile to 'cvs2svn-cvs-files.db', and a summary of each CVS -file's revisions as a list of pickled CVSRevisions to -'cvs2svn-cvs-items.pck'. In each case, items are assigned an -arbitrary key that is used to refer to them. - -We walk over the repository, collecting data about the RCS files into -an instance of CollectData. Each RCS file is processed with -rcsparse.parse(), which invokes callbacks from an instance of -cvs2svn's _FileDataCollector class (which is a subclass of -rcsparse.Sink). - -For each RCS file, the first thing the parser encounters is the -administrative header, including the head revision, the principal -branch, symbolic names, RCS comments, etc. The main thing that -happens here is that _FileDataCollector.define_tag() is invoked on -each symbolic name and its attached revision, so all the tags and -branches of this file get collected. When this stage is done, the -parser invokes admin_completed(), which writes the CVSFile to the -database. - -Next, the parser hits the revision summary section. That's the part -of the RCS file that looks like this: - - 1.6 - date 2002.06.12.04.54.12; author captnmark; state Exp; - branches - 1.6.2.1; - next 1.5; - - 1.5 - date 2002.05.28.18.02.11; author captnmark; state Exp; - branches; - next 1.4; - - [...] - -For each revision summary, _FileDataCollector.define_revision() is -invoked, recording that revision's metadata in various variables of -the _FileDataCollector class instance. - -After finishing the revision summaries, the parser invokes -_FileDataCollector.tree_completed(), which loops over the revision -information stored, determining if there are instances where a higher -revision was committed "before" a lower one (rare, but it can happen -when there was clock skew on the repository machine). If there are -any, it "resyncs" the timestamp of the earlier rev to be just before -that of the later rev, but saves the original timestamp in -self._rev_data[blah].original_timestamp, so we can later write out a -record to the resync file indicating that an adjustment was made (this -makes it possible to catch the other parts of this commit and resync -them similarly; more details below). - -Next, the parser encounters the *real* revision data, which has the -log messages and file contents. For each revision, it invokes -_FileDataCollector.set_revision_info(), which writes a record to -'cvs2svn-cvs-items.pck'. - -Also, for resync'd revisions, a line like this is written out to -'cvs2svn-resync.txt': - - 3d6c1329 18a 3d6c1328 - -The fields are: - - NEW_TIMESTAMP METADATA_ID OLD_TIMESTAMP - -(The resync file will be explained later.) - -That's it -- the RCS file is done. - -When every CVS file is done, CollectRevsPass is complete, and: - - - 'cvs2svn-cvs-files.db' contains a record of every CVS file. - - - 'cvs2svn-cvs-items.pck' contains a summary of every revision to - every CVS file, including a reference to the corresponding CVS - file record in 'cvs2svn-cvs-files.db'. The revisions are sorted - in groups, one per CVSFile. But a multi-file commit will still - be scattered all over the place. - - - 'cvs2svn-resync.txt' contains a small amount of resync data, in - no particular order. - - - 'cvs2svn-symbol-stats.pck' contains a pickled list of symbol - statistics entries (instances of - cvs2svn_lib.symbol_statistics._Stats) for each symbol that was - seen in the CVS repository. This includes the following - information: - - ID NAME TAG_COUNT BRANCH_COUNT BRANCH_COMMIT_COUNT BLOCKERS - - where ID is a unique integer identifying this symbol, NAME is the - symbol name, TAG_COUNT and BRANCH_COUNT are the number of CVS - files on which this symbol was used as a tag or branch - respectively, and BRANCH_COMMIT_COUNT is the number of files for - which commits were made on a branch with the given name. - BLOCKERS is a list of other symbols that were defined on branches - named NAME. (A symbol cannot be excluded if it has any blockers - that are not also being excluded.) These data are used to look - for inconsistencies in the use of symbols under CVS and to decide - which symbols can be excluded or forced to be branches and/or - tags. - - - 'cvs2svn-metadata.db' contains information that will help - determine what CVSRevisions are allowed to be combined into a - single SVNCommit. This class maps each CVSRevision to an SHA - digest that is constructed so that CVSRevisions that can be - combined are all mapped to the same digest. - - CVSRevisions that were part of a single CVS commit always have a - common author and log message, therefore these fields are always - included in the digest. Moreover, if ctx.cross_project_commits - is False, we avoid combining CVS revisions from separate projects - by including the project.id in the digest. This database - contains two mappings for each digest: - - digest (40-byte string) -> metadata_id (int) - - metadata_id (int as hex) -> (project_id, author, log_msg,) (tuple) - - The first mapping is used to locate the metadata_id for the - metadata record having a specific digest, and the second is used - as a key to locate the actual metadata. CVSRevision records - include the metadata_id. - - -CollateSymbolsPass -================== - -Use the symbol statistics collected in CollectRevsPass and any -command-line options to determine which symbols should be treated as -branches, which as tags, and which symbols should be excluded from the -conversion altogether. - -Create 'cvs2svn-symbols.pck', which contains a pickle of a list of -BranchSymbol, TagSymbol, and ExcludedSymbol objects indicating how -each symbol should be processed in the conversion. - - -ResyncRevsPass (formerly called pass2) -============== - -This is where the resync file is used. The goal of this pass is to -output the information from 'cvs2svn-cvs-items.pck to a new file, -'cvs2svn-cvs-items-resync.pck' (resynched items) with its -corresponding index file, 'cvs2svn-cvs-items-resync-index.dat'. It -has the same content as the original file, except that the timestamps -of some CVSRevisions have been resynced. - -First, read the whole resync file into a hash table that maps each -metadata_id to a list of lists. Each sublist represents one of the -timestamp adjustments from CollectRevsPass, and looks like this: - - [old_time_lower, old_time_upper, new_time] - -The reason to map each metadata_id to a list of sublists, instead of -to one list, is that sometimes you'll get the same metadata for -unrelated commits (for example, the same author commits many times -using the empty log message, or a log message that just says "Doc -tweaks."). So each metadata_id may need to "fan out" to cover -multiple commits, but without accidentally unifying those commits. - -Now we loop over the CVSRevisions in 'cvs2svn-cvs-items.pck, and for -each CVSRevision write a line to 'cvs2svn-revs-resync.txt'. Each line -of this file looks like this: - - 3dc32955 5a 12ab - -The fields are: - - 1. a fixed-width timestamp - 2. the metadata_id of the metadata (project, log message, author) - associated with this CVSRevision, as a hexadecimal string. - 3. the integer unique ID for this CVSRevision, as a hexadecimal - string. - -Any CVSRevision record in 'cvs2svn-cvs-items.pck' whose metadata_id -matches some resync entry and appears to be part of the same commit as -one of the sublists in that entry, gets tweaked. The tweak is to -adjust the commit time of the line to the new_time, which is taken -from the resync hash and results from the adjustment described in -CollectRevsPass. - -The way we figure out whether a given line needs to be tweaked is to -loop over all the sublists, seeing if this commit's original time -falls within the old<-->new time range for the current sublist. If it -does, we tweak the line before writing it out, and then conditionally -adjust the sublist's range to account for the timestamp we just -adjusted (since it could be an outlier). Note that this could, in -theory, result in separate commits being accidentally unified, since -we might gradually adjust the two sides of the range such that they are -eventually more than COMMIT_THRESHOLD seconds apart. However, this is -really a case of CVS not recording enough information to disambiguate -the commits; we'd know we have a time range that exceeds the -COMMIT_THRESHOLD, but we wouldn't necessarily know where to divide it -up. We could try some clever heuristic, but for now it's not -important -- after all, we're talking about commits that weren't -important enough to have a distinctive log message anyway, so does it -really matter if a couple of them accidentally get unified? Probably -not. - - -SortRevsPass (formerly called pass3) -============ - -This is where we deduce the changesets, that is, the grouping of file -changes into single commits. - -It's very simple -- run 'sort' on 'cvs2svn-revs-resync.txt', -converting it to 'cvs2svn-revs-resync-s.txt'. Because of the way the -data is laid out, this causes commits with the same metadata_id (that -is, the same author, log message, and optionally the same project) to -be grouped together. Poof! We now have the CVS changes grouped by -logical commit. - -In some cases, the changes in a given commit may be interleaved with -other commits that went on at the same time, because the sort gives -precedence to date before metadata_id. However, CreateDatabasesPass -detects this by seeing that the metadata_id is different, and -re-separates the commits. - - -CreateDatabasesPass (formerly called pass4): -=================== - -Find and create a database containing the last CVS revision that is a -source (also referred to as an "opening" revision) for each symbol. -This will result in a database containing key-value pairs whose key is -the id for a CVSRevision, and whose value is a list of symbol ids for -which that CVSRevision is the last "opening." - -The format for this file is: - - 'cvs2svn-symbol-last-cvs-revs.db': - Key Value - CVS Revision ID array of symbol ids - - For example: - - 5c --> [3, 8] - 62 --> [15] - 4d --> [29, 5] - f --> [18, 12] - - -AggregateRevsPass (formerly called pass5) -================= - -Primarily, this pass gathers CVS revisions into Subversion revisions -(a Subversion revision is comprised of one or more CVS revisions) -before we actually begin committing (where "committing" means either -to a Subversion repository or to a dump file). - -This pass does the following: - -1. Creates a database file to map Subversion revision numbers to - SVNCommit instances ('cvs2svn-svn-commits.db'). Creates another - database file to map CVS Revisions to their Subversion Revision - numbers ('cvs2svn-cvs-revs-to-svn-revnums.db'). - -2. When a file is copied to a symbolic name in cvs2svn, there are a - range of valid Subversion revisions that we can copy the file from. - The first valid Subversion revision number for a symbolic name is - called the "Opening", and the first *invalid* Subversion revision - number encountered after the "Opening" is called the "Closing". In - this pass, the SymbolingsLogger class writes out a line (for each - symbolic name that it opens) to cvs2svn-symbolic-names.txt if it is - the first possible source revision (the "opening" revision) for a - copy to create a branch or tag, or if it is the last possible - revision (the "closing" revision) for a copy to create a branch or - tag. Not every opening will have a corresponding closing. - - The format of each line is: - - SYMBOL_ID SVN_REVNUM TYPE BRANCH_ID CVS_FILE_ID - - For example: - - 1c 234 O * 1a7 - 34 245 O * 1a9 - 18a 241 C 34 1a7 - 122 201 O 7e 1b3 - - Here is what the columns mean: - - SYMBOL_ID: The id of the branch or tag that starts or ends in this - CVS Revision (there can be multiples per CVS rev). - - SVN_REVNUM: The Subversion revision number that is the opening or - closing for this SYMBOLIC_NAME. - - TYPE: "O" for Openings and "C" for Closings. - - BRANCH_ID: The id of the branch where this opening or closing - happened. '*' denotes the default branch. - - CVS_FILE_ID: The ID of the CVS file where this opening or closing - happened, in hexadecimal. - - See SymbolingsLogger for more details. - - -SortSymbolsPass (formerly called pass6) -=============== - -This pass merely sorts 'cvs2svn-symbolic-names.txt' into -'cvs2svn-symbolic-names-s.txt'. This orders the file first by -symbolic name, and second by Subversion revision number, thus grouping -all openings and closings for each symbolic name together. - - -IndexSymbolsPass (formerly called pass7) -================ - -This pass iterates through all the lines in -'cvs2svn-symbolic-names-s.txt', writing out a database file -('cvs2svn-symbolic-name-offsets.db') mapping SYMBOL_ID to the file -offset in 'cvs2svn-symbolic-names-s.txt' where SYMBOL_ID is first -encountered. This will allow us to seek to the various offsets in the -file and sequentially read only the openings and closings that we -need. - - -OutputPass (formerly called pass8) -========== - -This pass has very little "thinking" to do--it basically opens the -svn-nums-to-cvs-revs.db and, starting with Subversion revision 2 -(revision 1 creates /trunk, /tags, and /branches), sequentially plays -out all the commits to either a Subversion repository or to a -dumpfile. - -In --dumpfile mode, the result of this pass is a Subversion repository -dumpfile (suitable for input to 'svnadmin load'). The dumpfile is the -data's last static stage: last chance to check over the data, run it -through svndumpfilter, move the dumpfile to another machine, etc. - -When not in --dumpfile mode, no full dumpfile is created. Instead, -miniature dumpfiles representing a single revision are created, loaded -into the repository, and then removed. - -In both modes, the dumpfile revisions are created by walking through -'cvs2svn-data.s-revs.txt'. - -The databases 'cvs2svn-svn-nodes.db' and 'cvs2svn-svn-revisions.db' -form a skeletal (metadata only, no content) mirror of the repository -structure that cvs2svn is creating. They provide data about previous -revisions that cvs2svn requires while constructing the dumpstream. - - - =============================== - Branches and Tags Plan. - =============================== - -This pass is also where tag and branch creation is done. Since -subversion does tags and branches by copying from existing revisions -(then maybe editing the copy, making subcopies underneath, etc), the -big question for cvs2svn is how to achieve the minimum number of -operations per creation. For example, if it's possible to get the -right tag by just copying revision 53, then it's better to do that -than, say, copying revision 51 and then sub-copying in bits of -revision 52 and 53. - -Also, since CVS does not version symbolic names, there is the -secondary question of *when* to create a particular tag or branch. -For example, a tag might have been made at any time after the youngest -commit included in it, or might even have been made piecemeal; and the -same is true for a branch, with the added constraint that for any -particular file, the branch must have been created before the first -commit on the branch. - -Answering the second question first: cvs2svn creates tags as soon as -possible and branches as late as possible. - -Tags are created as soon as cvs2svn encounters the last CVS Revision -that is a source for that tag. The whole tag is created in one -Subversion commit. - -For branches, this is "just in time" creation -- the moment it sees -the first commit on a branch, it snaps the entire branch into -existence (or as much of it as possible), and then outputs the branch -commit. - -The reason we say "as much of it as possible" is that it's possible to -have a branch where some files have branch commits occuring earlier -than the other files even have the source revisions from which the -branch sprouts (this can happen if the branch was created piecemeal, -for example). In this case, we create as much of the branch as we -can, that is, as much of it as there are source revisions available to -copy, and leave the rest for later. "Later" might mean just until -other branch commits come in, or else during a cleanup stage that -happens at the end of this pass (about which more later). - -How just-in-time branch creation works: - -In order to make the "best" set of copies/deletes when creating a -branch, cvs2svn keeps track of two sets of trees while it's making -commits: - - 1. A skeleton mirror of the subversion repository, that is, an - array of revisions, with a tree hanging off each revision. (The - "array" is actually implemented as an anydbm database itself, - mapping string representations of numbers to root keys.) - - 2. A tree for each CVS symbolic name, and the svn file/directory - revisions from which various parts of that tree could be copied. - -Both tree sets live in anydbm databases, using the same basic schema: -unique keys map to marshal.dumps() representations of dictionaries, -which in turn map entry names to other unique keys: - - root_key ==> { entryname1 : entrykey1, entryname2 : entrykey2, ... } - entrykey1 ==> { entrynameX : entrykeyX, ... } - entrykey2 ==> { entrynameY : entrykeyY, ... } - entrykeyX ==> { etc, etc ...} - entrykeyY ==> { etc, etc ...} - -(The leaf nodes -- files -- are also dictionaries, for simplicity.) - -The repository mirror allows cvs2svn to remember what paths exist in -what revisions. - -For details on how branches and tags are created, please see the -docstring the SymbolingsLogger class (and its methods). - --*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- - --*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- - -Some older notes and ideas about cvs2svn. Not deleted, because they -may contain suggestions for future improvements in design. - ------------------------------------------------------------------------ - -An email from John Gardiner Myers <jgmyers@speakeasy.net> about some -considerations for the tool. - ------- -From: John Gardiner Myers <jgmyers@speakeasy.net> -Subject: Thoughts on CVS to SVN conversion -To: gstein@lyra.org -Date: Sun, 15 Apr 2001 17:47:10 -0700 - -Some things you may want to consider for a CVS to SVN conversion utility: - -If converting a CVS repository to SVN takes days, it would be good for -the conversion utility to keep its progress state on disk. If the -conversion fails halfway through due to a network outage or power -failure, that would allow the conversion to be resumed where it left off -instead of having to start over from an empty SVN repository. - -It is a short step from there to allowing periodic updates of a -read-only SVN repository from a read/write CVS repository. This allows -the more relaxed conversion procedure: - -1) Create SVN repository writable only by the conversion tool. -2) Update SVN repository from CVS repository. -3) Announce the time of CVS to SVN cutover. -4) Repeat step (2) as needed. -5) Disable commits to CVS repository, making it read-only. -6) Repeat step (2). -7) Enable commits to SVN repository. -8) Wait for developers to move their workspaces to SVN. -9) Decomission the CVS repository. - -You may forward this message or parts of it as you seem fit. ------- - ------------------------------------------------------------------------ - -Further design thoughts from Greg Stein <gstein@lyra.org> - -* timestamp the beginning of the process. ignore any commits that - occur after that timestamp; otherwise, you could miss portions of a - commit (e.g. scan A; commit occurs to A and B; scan B; create SVN - revision for items in B; we missed A) - -* the above timestamp can also be used for John's "grab any updates - that were missed in the previous pass." - -* for each file processed, watch out for simultaneous commits. this - may cause a problem during the reading/scanning/parsing of the file, - or the parse succeeds but the results are garbaged. this could be - fixed with a CVS lock, but I'd prefer read-only access. - - algorithm: get the mtime before opening the file. if an error occurs - during reading, and the mtime has changed, then restart the file. if - the read is successful, but the mtime changed, then restart the - file. - -* use a separate log to track unique branches and non-branched forks - of revision history (Q: is it possible to create, say, 1.4.1.3 - without a "real" branch?). this log can then be used to create a - /branches/ directory in the SVN repository. - - Note: we want to determine some way to coalesce branches across - files. It can't be based on name, though, since the same branch name - could be used in multiple places, yet they are semantically - different branches. Given files R, S, and T with branch B, we can - tie those files' branch B into a "semantic group" whenever we see - commit groups on a branch touching multiple files. Files that are - have a (named) branch but no commits on it are simply ignored. For - each "semantic group" of a branch, we'd create a branch based on - their common ancestor, then make the changes on the children as - necessary. For single-file commits to a branch, we could use - heuristics (pathname analysis) to add these to a group (and log what - we did), or we could put them in a "reject" kind of file for a human - to tell us what to do (the human would edit a config file of some - kind to instruct the converter). - -* if we have access to the CVSROOT/history, then we could process tags - properly. otherwise, we can only use heuristics or configuration - info to group up tags (branches can use commits; there are no - commits associated with tags) - -* ideally, we store every bit of data from the ,v files to enable a - complete restoration of the CVS repository. this could be done by - storing properties with CVS revision numbers and stuff (i.e. all - metadata not already embodied by SVN would go into properties) - -* how do we track the "states"? I presume "dead" is simply deleting - the entry from SVN. what are the other legal states, and do we need - to do anything with them? - -* where do we put the "description"? how about locks, access list, - keyword flags, etc. - -* note that using something like the SourceForge repository will be an - ideal test case. people *move* their repositories there, which means - that all kinds of stuff can be found in those repositories, from - wherever people used to run them, and under whatever development - policies may have been used. - - For example: I found one of the projects with a "permissions 644;" - line in the "gnuplot" repository. Most RCS releases issue warnings - about that (although they properly handle/skip the lines), and CVS - ignores RCS newphrases altogether. - -# vim:tw=70 diff -purNbBwx .svn cvs2svn-1.5.x/dist.sh cvs2svn-2.0.0/dist.sh --- cvs2svn-1.5.x/dist.sh 2006-10-03 15:06:17.000000000 +0200 +++ cvs2svn-2.0.0/dist.sh 2007-08-15 22:53:54.000000000 +0200 @@ -3,7 +3,7 @@ set -e # Build a cvs2svn distribution. -VERSION=`python -c '__name__=""; execfile("cvs2svn"); print VERSION'` +VERSION=`python cvs2svn_lib/version.py` echo "Building cvs2svn ${VERSION}" WC_REV=`svnversion -n .` DIST_BASE=cvs2svn-${VERSION} diff -purNbBwx .svn cvs2svn-1.5.x/doc/design-notes.txt cvs2svn-2.0.0/doc/design-notes.txt --- cvs2svn-1.5.x/doc/design-notes.txt 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/doc/design-notes.txt 2007-08-15 22:53:54.000000000 +0200 @@ -0,0 +1,800 @@ + How cvs2svn Works + ================= + + Theory and requirements + ------ --- ------------ + +There are two main problem converting a CVS repository to SVN: + +- CVS does not record enough information to determine what actually + happened to a repository. For example, CVS does not record: + + - Which file modifications were part of the same commit + + - The timestamp of tag and branch creations + + - Exactly which revision was the base of a branch (there is + ambiguity between x.y, x.y.2.0, x.y.4.0, etc.) + + - When the default branch was changed (for example, from a vendor + branch back to trunk). + +- The timestamps in a CVS archive are not reliable. It can easily + happen that timestamps are not even monotonic, and large errors (for + example due to a failing server clock battery) are not unusual. + +The absolutely crucial, sine qua non requirement of a conversion is +that the dependency relationships within a file be honored, mainly: + +- A revision depends on its predecessor + +- A branch creation depends on the revision from which it branched, + and commits on the branch depend on the branch creation + +- A tag creation depends on the revision being tagged + +These dependencies are reliably defined in the CVS repository, and +they trump all others, so they are the scaffolding of the conversion. + +Moreover, it is highly desirable that the timestamps of the SVN +commits be monotonically increasing. + +Within these constraints we also want the results of the conversion to +resemble the history of the CVS repository as closely as possible. +For example, the set of file changes grouped together in an SVN commit +should be the same as the files changed within the corresponding CVS +commit, insofar as that can be achieved in a manner that is consistent +with the dependency requirements. And the SVN commit timestamps +should recreate the time of the CVS commit as far as possible without +violating the monotonicity requirement. + +The basic idea of the conversion is this: create the largest +conceivable changesets, then split up changesets as necessary to break +any cycles in the graph of changeset dependencies. When all cycles +have been removed, then do a topological sort of the changesets (with +ambiguities resolved using CVS timestamps) to determine a +self-consistent changeset commit order. + +The quality of the conversion (not in terms of correctness, but in +terms of minimizing the number of svn commits) is mostly determined by +the cleverness of the heuristics used to split up cycles. And all of +this has to be affordable, especially in terms of conversion time and +RAM usage, for even the largest CVS repositories. + + + Implementation + -------------- + +A cvs2svn run consists of a number of passes. Each pass saves the +data it produces to files on disk, so that a) we don't hold huge +amounts of state in memory, and b) the conversion process is +resumable. + +CollectRevsPass (formerly called pass1) +=============== + +The goal of this pass is to collect from the CVS files all of the data +that will be required for the conversion. If the --use-internal-co +option was used, this pass also collects the file delta data; for +-use-rcs or -use-cvs, the actual file contents are read again in +OutputPass. + +To collect this data, we walk over the repository, collecting data +about the RCS files into an instance of CollectData. Each RCS file is +processed with rcsparse.parse(), which invokes callbacks from an +instance of cvs2svn's _FileDataCollector class (which is a subclass of +rcsparse.Sink). + +While a file is being processed, all of the data for the file (except +for contents and log messages) is held in memory. When the file has +been read completely, its data is converted into an instance of +CVSFileItems, and this instance is manipulated a bit then pickled and +stored to 'cvs-items.pck'. + +For each RCS file, the first thing the parser encounters is the +administrative header, including the head revision, the principal +branch, symbolic names, RCS comments, etc. The main thing that +happens here is that _FileDataCollector.define_tag() is invoked on +each symbolic name and its attached revision, so all the tags and +branches of this file get collected. + +Next, the parser hits the revision summary section. That's the part +of the RCS file that looks like this: + + 1.6 + date 2002.06.12.04.54.12; author captnmark; state Exp; + branches + 1.6.2.1; + next 1.5; + + 1.5 + date 2002.05.28.18.02.11; author captnmark; state Exp; + branches; + next 1.4; + + [...] + +For each revision summary, _FileDataCollector.define_revision() is +invoked, recording that revision's metadata in various variables of +the _FileDataCollector class instance. + +Next, the parser encounters the *real* revision data, which has the +log messages and file contents. For each revision, it invokes +_FileDataCollector.set_revision_info(), which sets some more fields in +_RevisionData. It also invokes RevisionRecorder.record_text(), which +gives the RevisionRecorder the chance to record the file text if +desired. record_test() is allowed to return a token, which is carried +along with the CVSRevision data and can be used by RevisionReader to +retrieve the text in OutputPass. + +When the parser is done with the file, _ProjectDataCollector takes the +resulting CVSFileItems object and manipulates it to handle some CVS +features: + + - If the file had a vendor branch, make some adjustments to the + file dependency graph to reflect implicit dependencies related to + the vendor branch. Also delete the 1.1 revision in the usual + case that it doesn't contain any useful information. + + - If the file was added on a branch rather than on trunk, then + delete the "dead" 1.1 revision on trunk in the usual case that it + doesn't contain any useful information. + + - If the file was added on a branch after it already existed on + trunk, then recent versions of CVS add an extra "dead" revision + on the branch. Remove this revision in the usual case that it + doesn't contain any useful information, and sever the branch from + trunk (since the branch version is independent of the trunk + version). + + - If the conversion was started with the --trunk-only option, then + + 1. graft any non-trunk default branch revisions onto trunk + (because they affect the history of the default branch), and + + 2. delete all branches and tags and all remaining branch + revisions. + +Finally, the RevisionRecorder.finish_file() callback is called, the +CVSFileItems instance is stored to a database, and statistics about +how symbols were used in the file are recorded. + +That's it -- the RCS file is done. + +When every CVS file is done, CollectRevsPass is complete, and: + + - The basic information about each file (filename, path, etc) is + written as a pickled CVSFile instance to 'cvs-files.db'. + + - Information about each symbol seen, along with statistics like + how often it was used as a branch or tag, is written as a pickled + symbol_statistics._Stat object to 'symbol-statistics.pck'. This + includes the following information: + + ID -- a unique positive identifying integer + + NAME -- the symbol name + + TAG_CREATE_COUNT -- the number of times the symbol was used + as a tag + + BRANCH_CREATE_COUNT -- the number of times the symbol was + used as a branch + + BRANCH_COMMIT_COUNT -- the number of files in which there was + a commit on a branch with this name. + + BRANCH_BLOCKERS -- the set of other symbols that ever + sprouted from a branch with this name. (A symbol cannot + be excluded from the conversion unless all of its + blockers are also excluded.) + + POSSIBLE_PARENTS -- a count of in how many files each other + branch could have served as the symbol's source. + + These data are used to look for inconsistencies in the use of + symbols under CVS and to decide which symbols can be excluded or + forced to be branches and/or tags. The POSSIBLE_PARENTS data is + used to pick the "optimum" parent from which the symbol should + sprout in as many files as possible. + + For a multiproject conversion, distinct symbol records (and IDs) + are created for symbols in separate projects, even if they have + the same name. This is to prevent symbols in separate projects + from being filled at the same time. + + - Information about each CVS event is converted into a CVSItem + instance and stored to 'cvs-items.pck'. There are several types + of CVSItems: + + CVSRevision -- A specific revision of a specific CVS file. + + CVSBranch -- The creation of a branch tag in a specific CVS + file. + + CVSTag -- The creation of a non-branch tag in a specific CVS + file. + + The CVSItems are grouped into CVSFileItems instances, one per + CVSFile. But a multi-file commit will still be scattered all + over the place. + + - Selected metadata for each CVS revision, including the author and + log message, is written to 'metadata.db'. The purpose is + twofold: first, to save space by not having to save this + information multiple times, and second because CVSRevisions that + have the same metadata are candidates to be combined into an SVN + changeset. + + First, an SHA digest is created for each set of metadata. The + digest is constructed so that CVSRevisions that can be combined + are all mapped to the same digest. CVSRevisions that were part + of a single CVS commit always have a common author and log + message, therefore these fields are always included in the + digest. Moreover: + + - if ctx.cross_project_commits is False, we avoid combining CVS + revisions from separate projects by including the project.id in + the digest. + + - if ctx.cross_branch_commits is False, we avoid combining CVS + revisions from different branches by including the branch name + in the digest. + + During the database creation phase, the database keeps track of a + map + + digest (20-byte string) -> metadata_id (int) + + to allow the record for a set of metadata to be located + efficiently. As data are collected, it stores a map + + metadata_id (int as hex) -> (author, log_msg,) (tuple) + + into the database for use in future passes. CVSRevision records + include the metadata_id. + +During this run, each CVSFile, Symbol, CVSItem, and metadata record is +assigned an arbitrary unique ID that is used throughout the conversion +to refer to it. + + +CollateSymbolsPass +================== + +Use the symbol statistics collected in CollectRevsPass and any runtime +options to determine which symbols should be treated as branches, +which as tags, and which should be excluded from the conversion +altogether. + +Create 'symbols.pck', which contains a pickle of a list of TypedSymbol +(Branch, Tag, or ExcludedSymbol) instances indicating how each symbol +should be processed in the conversion. The IDs used for a TypedSymbol +is the same as the ID allocated to the corresponding symbol in +CollectRevsPass, so references in CVSItems do not have to be updated. + + +FilterSymbolsPass +================= + +This pass works through the CVSFileItems instances stored in +'cvs-items.pck', processing all of the items from each file as a +group. (This is the last pass in which all of the CVSItems for a file +are in memory at once.) It does the following things: + + - Exclude any symbols that CollateSymbolsPass determined should be + excluded, and any revisions on such branches. Also delete + references from other CVSItems to those that are being deleted. + + - Transform any branches to tags or vice versa, also depending on + the results of CollateSymbolsPass, and fix up the references from + other CVSItems. + + - Decide what line of development to use as the parent for each + symbol in the file, and adjust the file's dependency tree + accordingly. + + - For each CVSRevision, record the list of symbols that the + revision opens and closes. + + - Write the surviving CVSItems to the indexed store in files + 'cvs-items-filtered-index.dat' and 'cvs-items-filtered.pck'. + + - Write a summary of each surviving CVSRevision to + 'revs-summary.txt'. Each line of the file has the format + + METADATA_ID TIMESTAMP CVS_REVISION_ID + + where TIMESTAMP is a fixed-width timestamp. These summaries will + be sorted in SortRevisionSummaryPass then used by + InitializeChangesetsPass to create preliminary + RevisionChangesets. + + - Write a summary of CVSSymbols to 'symbols-summary.txt'. Each + line of the file has the format + + SYMBOL_ID CVS_SYMBOL_ID + + This information will be sorted by SYMBOL_ID in + SortSymbolSummaryPass then used to create preliminary + SymbolChangesets. + + +SortRevisionSummaryPass +======================= + +Sort the revision summary written by FilterSymbolsPass, creating +'revs-summary-s.txt'. The sort groups items that might be added to +the same changeset together and, within a group, sorts revisions by +timestamp. This step makes it easy for InitializeChangesetsPass to +read the initial draft of RevisionChangesets straight from the file. + + +SortSymbolSummaryPass +===================== + +Sort the symbol summary written by FilterSymbolsPass, creating +'symbols-summary-s.txt'. The sort groups together symbol items that +might be added to the same changeset (though not in anything +resembling chronological order). The output of this pass is used by +InitializeChangesetsPass. + + +InitializeChangesetsPass +======================== + +This pass creates first-draft changesets, splitting them using +COMMIT_THRESHOLD and breaking up any revision changesets that have +internal dependencies. + +The raw material for creating revision changesets is +'revs-summary-s.txt', which already has CVSRevisions sorted in such a +way that potential changesets are grouped together and sorted by date. +The contents of this file are read line by line, and the corresponding +CVSRevisions are accumulated into a changeset. Whenever the +metadata_id changes, or whenever there is a time gap of more than +COMMIT_THRESHOLD (currently set to 5 minutes) between CVSRevisions, +then a new changeset is started. + +At this point a revision changeset can have internal dependencies if +two commits were made to the same file with the same log message +within COMMIT_THRESHOLD of each other. The next job of this pass is +to split up changesets in such a way to break such internal +dependencies. This is done by sorting the CVSRevisions within a +changeset by timestamp, then choosing the split point that breaks the +most internal dependencies. This procedure is continued recursively +until there are no more dependencies internal to a single changeset. + +Analogously, the CVSSymbol items from 'symbols-summary-s.txt' are +grouped into symbol changesets. (Symbol changesets cannot have +internal dependencies, so there is no need to break them up at this +stage.) + +Finally, this pass writes a new version of the CVSItem database with +the CVSItems written in order grouped by the preliminary changeset to +which they belong. Even though the preliminary changesets still have +to be split up to form final changesets, grouping the CVSItems this +way improves the locality of disk accesses and thereby speeds up later +passes. + +The result of this pass is two databases: + + - 'cvs-item-to-changeset.dat', which maps CVSItem ids to the id of + the changeset containing the item, and + + - 'changesets.pck' and 'changesets-index.dat', which contain the + changeset objects themselves, indexed by changeset id. + + - 'cvs-items-sorted-index.dat' and 'cvs-items-sorted.pck', which + contain the pickled CVSItems ordered by changeset. + + +BreakRevisionChangesetCyclesPass +================================ + +There can still be cycles in the dependency graph of +RevisionChangesets caused by: + + - Interleaved commits. Since CVS commits are not atomic, it can + happen that two commits are in progress at the same time and each + alters the same two files, but in different orders. These should + be small cycles involving only a few revision changesets. To + resolve these cycles, one or more of the RevisionChangesets have + to be split up (eventually becoming separate svn commits). + + - Cycles involving a RevisionChangeset formed by the accidental + combination of unrelated items within a short period of time that + have the same author and log message. These should also be small + cycles involving only a few changesets. + +The job of this pass is to break up such cycles (those involving only +CVSRevisions). + +This pass works by building up the graph of revision changesets and +their dependencies in memory, then attempting a topological sort of +the changesets. Whenever the topological sort stalls, that implies +the existence of a cycle, one of which can easily be determined. This +cycle is broken through the use of heuristics that try to determine an +"efficient" way of splitting one or more of the changesets that are +involved. + +The new RevisionChangesets are written to +'cvs-item-to-changeset-revbroken.dat', 'changesets-revbroken.pck', and +'changesets-revbroken-index.dat', along with the unmodified +SymbolChangesets. These files are in the same format as the analogous +files produced by InitializeChangesetsPass. + + +RevisionTopologicalSortPass +=========================== + +Topologically sort the RevisionChangesets, thereby picking the order +in which the RevisionChangesets will be committed. (Since the +previous pass eliminated any dependency cycles, this sort is +guaranteed to succeed.) Ambiguities in the topological sort are +resolved using the changesets' timestamps. Then simplify the +changeset graph into a linear chain by converting each +RevisionChangeset into an OrderedChangeset that stores dependency +links only to its commit-order predecessor and successor. This +simplified graph enforces the commit order that resulted from the +topological sort, even after the SymbolChangesets are added back into +the graph later. Store the OrderedChangesets into +'changesets-revsorted.pck' and 'changesets-revsorted-index.dat' along +with the unmodified SymbolChangesets. + + +BreakSymbolChangesetCyclesPass +============================== + +It is possible for there to be cycles in the graph of SymbolChangesets +caused by: + + - Split creation of branches. It is possible that branch A depends + on branch B in one file, but B depends on A in another file. + These cycles can be large, but they only involve + SymbolChangesets. + +Break up such dependency loops. Output the results to +'cvs-item-to-changeset-symbroken.dat', +'changesets-symbroken-index.dat', and 'changesets-symbroken.pck'. + + +BreakAllChangesetCyclesPass +=========================== + +The complete changeset graph (including both RevisionChangesets and +BranchChangesets) can still have dependency cycles cause by: + + - Split creation of branches. The same branch tag can be added to + different files at completely different times. It is possible + that the revision that was branched later depends on a + RevisionChangeset that involves a file on the branch that was + created earlier. These cycles can be large, but they always + involve a SymbolChangeset. To resolve these cycles, the + SymbolChangeset is split up into two changesets. + +In fact, tag changesets do not have to be considered--CVSTags cannot +participate in dependency cycles because no other CVSItem can depend +on a CVSTag. + +Since the input of this pass has been through +RevisionTopologicalSortPass, all revision cycles have already been +broken up and the order that the RevisionChangesets will be committed +has been determined. In this pass, the complete changeset graph is +created in memory, including the linear list of OrderedChangesets from +RevisionTopologicalSortPass plus all of the symbol changesets. +Because this pass doesn't break up any OrderedChangesets, it is +constrained to finding places within the revision changeset sequence +in which the symbol changeset commits can be inserted. + +The new changesets are written to +'cvs-item-to-changeset-allbroken.dat', 'changesets-allbroken.pck', and +'changesets-allbroken-index.dat', which are in the same format as the +analogous files produced by InitializeChangesetsPass. + + +TopologicalSortPass +=================== + +Now that the earlier passes have broken up any dependency cycles among +the changesets, it is possible to order all of the changesets in such +a way that all of a changeset's dependencies are committed before the +changeset itself. This pass does so by again building up the graph of +changesets in memory, then at each step picking a changeset that has +no remaining dependencies and removing it from the graph. Whenever +more than one dependency-free changeset is available, symbol +changesets are chosen before revision changesets. As changesets are +processed, the timestamp sequence is ensured to be monotonic by the +simple expedient of adjusting retrograde timestamps to be later than +their predecessor. Timestamps that lie in the future, on the other +hand, are assumed to be bogus and are adjusted backwards, also to be +just later than their predecessor. + +This pass writes a line to 'changesets-s.txt' for each +RevisionChangeset, in the order that the changesets should be +committed. Each lines contains + + CHANGESET_ID TIMESTAMP + +where CHANGESET_ID is the id of the changeset in the +'changesets-allbroken' databases and TIMESTAMP is the timstamp that +should be assigned to it when it is committed. Both values are +written in hexadecimal. + + +CreateRevsPass (formerly called pass5) +============== + +This pass generates SVNCommits from Changesets and records symbol +openings and closings. (One Changeset can result in multiple +SVNCommits, for example if it causes symbols to be filled or copies to +a vendor branch.) + +This pass does the following: + +1. Creates a database file to map Subversion revision numbers to + SVNCommit instances ('svn-commits-index.dat' and + 'svn-commits.pck'). Creates another database file to map CVS + Revisions to their Subversion Revision numbers + ('cvs-revs-to-svn-revnums.db'). + +2. When a file is copied to a symbolic name in cvs2svn, it is copied + from a specific source: either a CVSRevision, or a copy created by + a previous CVSBranch of the file. The copy has to be made from an + SVN revision that is during the lifetime of the source. The SVN + revision when the source was created is called the symbol's + "opening", and the SVN revision when it was deleted or overwritten + is called the symbol's "closing". In this pass, the + SymbolingsLogger class writes out a line to 'symbolic-names.txt' + for each symbol opening or closing. Note that some openings do not + have closings, namely if the corresponding source is still present + at the HEAD revision. + + The format of each line is: + + SYMBOL_ID SVN_REVNUM TYPE CVS_SYMBOL_ID + + For example: + + 1c 234 O 1a7 + 34 245 O 1a9 + 18a 241 C 1a7 + 122 201 O 1b3 + + Here is what the columns mean: + + SYMBOL_ID -- The id of the branch or tag that has an opening in + this SVN_REVNUM, in hexadecimal. + + SVN_REVNUM -- The Subversion revision number in which the opening + or closing occurred. (There can be multiple openings and + closings per SVN_REVNUM). + + TYPE -- "O" for openings and "C" for closings. + + CVS_SYMBOL_ID -- The id of the CVSSymbol instance whose opening or + closing is being described, in hexadecimal. + + Each CVSSymbol that tags a non-dead file has exactly one opening + and either zero or one closing. The closing, if it exists, always + occurs in a later SVN revision than the opening. + + See SymbolingsLogger for more details. + + +SortSymbolsPass (formerly called pass6) +=============== + +This pass sorts 'symbolic-names.txt' into 'symbolic-names-s.txt'. +This orders the file first by symbol ID, and second by Subversion +revision number, thus grouping all openings and closings for each +symbolic name together. + + +IndexSymbolsPass (formerly called pass7) +================ + +This pass iterates through all the lines in 'symbolic-names-s.txt', +writing out a pickle file ('symbol-offsets.pck') mapping SYMBOL_ID to +the file offset in 'symbolic-names-s.txt' where SYMBOL_ID is first +encountered. This will allow us to seek to the various offsets in the +file and sequentially read only the openings and closings that we +need. + + +OutputPass (formerly called pass8) +========== + +This pass opens the svn-commits database and, starting with Subversion +revision 2 (revision 1 creates /trunk, /tags, and /branches), +sequentially plays out all the commits to either a Subversion +repository or to a dumpfile. It also decides what sources to use to +fill symbols. + +In --dumpfile mode, the result of this pass is a Subversion repository +dumpfile (suitable for input to 'svnadmin load'). The dumpfile is the +data's last static stage: last chance to check over the data, run it +through svndumpfilter, move the dumpfile to another machine, etc. + +When not in --dumpfile mode, no full dumpfile is created. Instead, +miniature dumpfiles representing a single revisions are created, +loaded into the repository, and then removed. + +In both modes, the dumpfile revisions are created by walking through +'data.s-revs.txt'. + +The databases 'svn-nodes.db' and 'svn-revisions.db' form a skeletal +(metadata only, no content) mirror of the repository structure that +cvs2svn is creating. They provide data about previous revisions that +cvs2svn requires while constructing the dumpstream. + + + =============================== + Branches and Tags Plan. + =============================== + +This pass is also where tag and branch creation is done. Since +subversion does tags and branches by copying from existing revisions +(then maybe editing the copy, making subcopies underneath, etc), the +big question for cvs2svn is how to achieve the minimum number of +operations per creation. For example, if it's possible to get the +right tag by just copying revision 53, then it's better to do that +than, say, copying revision 51 and then sub-copying in bits of +revision 52 and 53. + +Tags are created as soon as cvs2svn encounters the last CVS Revision +that is a source for that tag. The whole tag is created in one +Subversion commit. + +Branches are created as soon as all of their prerequisites are in +place. If a branch creation had to be broken up due to dependency +cycles, then non-final parts are also created as soon as their +prerequisites are ready. In such a case, the SymbolChangeset +specifies how much of the branch can be created in each step. + +How just-in-time branch creation works: + +In order to make the "best" set of copies/deletes when creating a +branch, cvs2svn keeps track of two sets of trees while it's making +commits: + + 1. A skeleton mirror of the subversion repository, that is, an + array of revisions, with a tree hanging off each revision. (The + "array" is actually implemented as an anydbm database itself, + mapping string representations of numbers to root keys.) + + 2. A tree for each CVS symbolic name, and the svn file/directory + revisions from which various parts of that tree could be copied. + +Both tree sets live in anydbm databases, using the same basic schema: +unique keys map to marshal.dumps() representations of dictionaries, +which in turn map entry names to other unique keys: + + root_key ==> { entryname1 : entrykey1, entryname2 : entrykey2, ... } + entrykey1 ==> { entrynameX : entrykeyX, ... } + entrykey2 ==> { entrynameY : entrykeyY, ... } + entrykeyX ==> { etc, etc ...} + entrykeyY ==> { etc, etc ...} + +(The leaf nodes -- files -- are also dictionaries, for simplicity.) + +The repository mirror allows cvs2svn to remember what paths exist in +what revisions. + +For details on how branches and tags are created, please see the +docstring the SymbolingsLogger class (and its methods). + +-*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- +- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- - +-*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- -*- + +Some older notes and ideas about cvs2svn. Not deleted, because they +may contain suggestions for future improvements in design. + +----------------------------------------------------------------------- + +An email from John Gardiner Myers <jgmyers@speakeasy.net> about some +considerations for the tool. + +------ +From: John Gardiner Myers <jgmyers@speakeasy.net> +Subject: Thoughts on CVS to SVN conversion +To: gstein@lyra.org +Date: Sun, 15 Apr 2001 17:47:10 -0700 + +Some things you may want to consider for a CVS to SVN conversion utility: + +If converting a CVS repository to SVN takes days, it would be good for +the conversion utility to keep its progress state on disk. If the +conversion fails halfway through due to a network outage or power +failure, that would allow the conversion to be resumed where it left off +instead of having to start over from an empty SVN repository. + +It is a short step from there to allowing periodic updates of a +read-only SVN repository from a read/write CVS repository. This allows +the more relaxed conversion procedure: + +1) Create SVN repository writable only by the conversion tool. +2) Update SVN repository from CVS repository. +3) Announce the time of CVS to SVN cutover. +4) Repeat step (2) as needed. +5) Disable commits to CVS repository, making it read-only. +6) Repeat step (2). +7) Enable commits to SVN repository. +8) Wait for developers to move their workspaces to SVN. +9) Decomission the CVS repository. + +You may forward this message or parts of it as you seem fit. +------ + +----------------------------------------------------------------------- + +Further design thoughts from Greg Stein <gstein@lyra.org> + +* timestamp the beginning of the process. ignore any commits that + occur after that timestamp; otherwise, you could miss portions of a + commit (e.g. scan A; commit occurs to A and B; scan B; create SVN + revision for items in B; we missed A) + +* the above timestamp can also be used for John's "grab any updates + that were missed in the previous pass." + +* for each file processed, watch out for simultaneous commits. this + may cause a problem during the reading/scanning/parsing of the file, + or the parse succeeds but the results are garbaged. this could be + fixed with a CVS lock, but I'd prefer read-only access. + + algorithm: get the mtime before opening the file. if an error occurs + during reading, and the mtime has changed, then restart the file. if + the read is successful, but the mtime changed, then restart the + file. + +* use a separate log to track unique branches and non-branched forks + of revision history (Q: is it possible to create, say, 1.4.1.3 + without a "real" branch?). this log can then be used to create a + /branches/ directory in the SVN repository. + + Note: we want to determine some way to coalesce branches across + files. It can't be based on name, though, since the same branch name + could be used in multiple places, yet they are semantically + different branches. Given files R, S, and T with branch B, we can + tie those files' branch B into a "semantic group" whenever we see + commit groups on a branch touching multiple files. Files that are + have a (named) branch but no commits on it are simply ignored. For + each "semantic group" of a branch, we'd create a branch based on + their common ancestor, then make the changes on the children as + necessary. For single-file commits to a branch, we could use + heuristics (pathname analysis) to add these to a group (and log what + we did), or we could put them in a "reject" kind of file for a human + to tell us what to do (the human would edit a config file of some + kind to instruct the converter). + +* if we have access to the CVSROOT/history, then we could process tags + properly. otherwise, we can only use heuristics or configuration + info to group up tags (branches can use commits; there are no + commits associated with tags) + +* ideally, we store every bit of data from the ,v files to enable a + complete restoration of the CVS repository. this could be done by + storing properties with CVS revision numbers and stuff (i.e. all + metadata not already embodied by SVN would go into properties) + +* how do we track the "states"? I presume "dead" is simply deleting + the entry from SVN. what are the other legal states, and do we need + to do anything with them? + +* where do we put the "description"? how about locks, access list, + keyword flags, etc. + +* note that using something like the SourceForge repository will be an + ideal test case. people *move* their repositories there, which means + that all kinds of stuff can be found in those repositories, from + wherever people used to run them, and under whatever development + policies may have been used. + + For example: I found one of the projects with a "permissions 644;" + line in the "gnuplot" repository. Most RCS releases issue warnings + about that (although they properly handle/skip the lines), and CVS + ignores RCS newphrases altogether. + +# vim:tw=70 diff -purNbBwx .svn cvs2svn-1.5.x/doc/making-releases.txt cvs2svn-2.0.0/doc/making-releases.txt --- cvs2svn-1.5.x/doc/making-releases.txt 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/doc/making-releases.txt 2007-08-15 22:53:54.000000000 +0200 @@ -0,0 +1,96 @@ +Making releases +=============== + + Pre-release (repeat as appropriate): + A. Backport changes if appropriate. + B. Update CHANGES. + C. Run the testsuite, check everything is OK. + D. Trial-run ./dist.sh, check the output is sane. + E. Run "www/validate.sh www/*.html" to make sure that the + documentation HTML is valid. + + Notes for specific releases: + + Creating the release: + 1. If this is an A.B.0 release, make a branch: + svn copy http://cvs2svn.tigris.org/svn/cvs2svn/trunk \ + http://cvs2svn.tigris.org/svn/cvs2svn/branches/A.B.x + and then increment the -dev VERSION in cvs2svn_lib/version.py on + trunk. + 2. Set the release number and date in CHANGES on trunk. + 3. Switch to a branch working copy. + 4. Merge CHANGES to the release branch. + 5. Make a trial distribution and see that the unit tests run: + ./dist.sh + tar -xzf cvs2svn-A.B.C-dev.tar.gz + cd cvs2svn-A.B.C-dev + ./run-tests.py + cd .. + rm -rf cvs2svn-A.B.C-dev + 6. Set VERSION in cvs2svn_lib/version.py and then run: + svn copy . http://cvs2svn.tigris.org/svn/cvs2svn/tags/A.B.C + 7. Increment the -dev VERSION in cvs2svn_lib/version.py on the + A.B.x branch. + 8. Switch to the tag. + 9. Run: + ./dist.sh + 10. Create a detached signature for the tar file: + gpg --detach-sign -a cvs2svn-A.B.C.tar.gz + + Publishing the release: + 1. Upload tarball and signature to website download area (log in, + go to "Downloads", then "Releases" folder, then "Add new file"). + 2. Move old releases into the 'Old' folder of the download area + (click on the "Edit" link next to the files, then change + Status -> "Obsolete" and Folder -> "Old"). + 3. Create a project announcement on the website. See example + template below. + 4. Send an announcement to announce@cvs2svn.tigris.org. + (users@cvs2svn.tigris.org is subscribed to announce, so there is + no need to send to both lists.) See example template below. + 5. Update the topic on #cvs2svn. + + +Release announcement templates +============================== + +Here are suggested release announcement templates. Fill in the substitutions +as appropriate, and refer to previous announcements for examples. + +Web: +[[[ +cvs2svn VERSION is now released. +<br /> +The MD5 checksum is CHECKSUM +<br /> +For more information see <a +href="http://cvs2svn.tigris.org/source/browse/cvs2svn/tags/VERSION/CHANGES?view=markup" +>CHANGES</a>. +<br /> +Download: <a +href="http://cvs2svn.tigris.org/files/documents/1462/NNNNN/cvs2svn-VERSION.tar.gz" +>cvs2svn-VERSION.tar.gz</a>. +]]] + +Email: +[[[ +Subject: cvs2svn VERSION released +To: announce@cvs2svn.tigris.org +Reply-to: users@cvs2svn.tigris.org + +cvs2svn VERSION is now released. + +BRIEF_SUMMARY_OF_VERSION_HIGHLIGHTS + +For more information see: +http://cvs2svn.tigris.org/source/browse/cvs2svn/tags/VERSION/CHANGES?view=markup + +You can get it here: +http://cvs2svn.tigris.org/files/documents/1462/NNNNN/cvs2svn-VERSION.tar.gz + +The MD5 checksum is CHECKSUM. + +Please send any bug reports and comments to users@cvs2svn.tigris.org. + +YOUR_NAME, on behalf of the cvs2svn development team. +]]] diff -purNbBwx .svn cvs2svn-1.5.x/doc/revision-reader.txt cvs2svn-2.0.0/doc/revision-reader.txt --- cvs2svn-1.5.x/doc/revision-reader.txt 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/doc/revision-reader.txt 2007-08-15 22:53:54.000000000 +0200 @@ -0,0 +1,91 @@ +This file contains a description of the RevisionRecorder / +RevisionExcluder / RevisionReader mechanism. + + +cvs2svn now includes hooks to make it possible to avoid having to +invoke CVS or RCS zillions of times in OutputPass (which is otherwise +the most expensive part of the conversion). Here is a brief +description of how the hooks work. + +Each conversion requires an instance of RevisionReader, whose +responsibility is to produce the text contents of CVS revisions on +demand during OutputPass. The RevisionReader can read the CVS +revision contents directly out of the RCS files during OutputPass. +But additional hooks support the construction of different kinds of +RevisionReader that record the CVS file revisions' contents during +CollectRevsPass then output the contents during OutputPass. (Indeed, +for non-SVN backends, OutputPass might not even require the file +contents.) + +Specifically, the RevisionReader instance can supply instances of two +other classes to help it out: + + RevisionRecorder -- can record the CVS revisions' text during + CollectRevsPass to avoid having to parse the RCS files again + during OutputPass. + + RevisionExcluder -- is informed during FilterSymbolsPass about CVS + revisions that have been excluded from the conversion and will + therefore not be needed during OutputPass. This mechanism can + be used to discard temporary data that will not be required. + +The type of RevisionReader to be used for a run of cvs2svn can be set +using --use-internal-co, --use-rcs, or --use-cvs, or via the --options +file with a line like: + + ctx.revision_reader = MyRevisionReader() + +The following RevisionReaders are supplied with cvs2svn: + + InternalRevisionReader -- an InternalRevisionRecorder records the + revisions' delta text and their dependencies during + CollectRevsPass; an InternalRevisionExcluder discards unneeded + deltas in FilterSymbolsPass; an InternalRevisionReader + reconstitutes the revisions' contents during OutputPass from + the recorded data. This is by far the fastest option, but it + requires a substantial amount of temporary disk space for the + duration of the conversion. + + RCSRevisionReader -- uses RCS's "co" command to extract the + revision text during OutputPass. This is slower than + InternalRevisionReader because "co" has to be executed very + many times, but is better tested and does not require any + temporary disk space. RCSRevisionReader does not use a + RevisionRecorder or RevisionExcluder. + + CVSRevisionReader -- uses the "cvs" command to extract the + revision text during OutputPass. This is even slower than + RCSRevisionReader, but it can handle a some CVS file quirks + that stymy RCSRevisionReader (see the cvs2svn HTML + documentation). CVSRevisionReader does not use a + RevisionRecorder or RevisionExcluder. + + +It is possible to write your own RevisionReader if you would like to +do things differently. A RevisionReader that wants to record +information during CollectRevsPass should define a method +get_revision_recorder(), which should return an instance of +RevisionRecorder. A RevisionRecorder has callback methods that are +invoked as the CVS files are parsed. For example, +RevisionRecorder.record_text() is passed the log message and text +(full text or delta) for each file revision. The record_text() method +is allowed to return an arbitrary token (for example, a content hash), +and that token is stored into CVSRevision.revision_recorder_token and +carried along by cvs2svn. + +A RevisionReader that wants to be informed about revisions that will +be excluded from the conversion should define a method +get_revision_excluder(), which returns an instance of +RevisionExcluder. The RevisionExcluder's callbacks are invoked during +FilterSymbolsPass to tell it which CVS revisions will actually be +needed by the conversion. The RevisionExcluder has the opportunity to +use this information to delete unneeded temporary data. + +Later, when OutputPass requires the file contents, it calls +RevisionReader.get_content_stream(), which is passed a CVSRevision +instance and has to return a stream object that produces the file +revision's contents. The fancy RevisionReader could use the token to +retrieve the pre-stored file contents without having to call CVS or +RCS at all. + + diff -purNbBwx .svn cvs2svn-1.5.x/doc/symbol-notes.txt cvs2svn-2.0.0/doc/symbol-notes.txt --- cvs2svn-1.5.x/doc/symbol-notes.txt 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/doc/symbol-notes.txt 2007-08-15 22:53:54.000000000 +0200 @@ -0,0 +1,332 @@ +This is a description of how symbols (tags and branches) are handled +by cvs2svn, determined by reading the code. + + +Notation +======== + + CVSFile -- a single file within the CVS repository. This object + basically only records the filename of the corresponding RCS + file, and the relative filename that this file will have + within the SVN repository. A single CVSFile object is used + for all of the CVSItems on all lines of development related to + that file. + + +The following terms and the corresponding classes represent +project-wide concepts. For example, a project will only have a single +Branch named "foo" even if many files appear on that branch. Each of +these objects is assigned a unique integer ID during CollectRevsPass +which is preserved during the entire conversion (even if, say, a +Branch is mutated into a Tag). + + Trunk -- the main line of development for a particular Project in + CVS. The Trunk class inherits from LineOfDevelopment. + + Symbol -- a Branch or a Tag within a particular Project (see + below). Instances of this class are also used to represent + symbols early in the conversion, before it has been decided + whether to convert the symbol as a Branch or as a Tag. A + Symbol contains an id, a Project, and a name. + + Branch -- a symbol within a particular Project that will be + treated as a branch in SVN. Usually corresponds to a branch + tag in CVS, but might be a non-branch tag that was mutated in + CollateSymbolsPass. In SVN, this will correspond to a + subdirectory of the project's "branches" directory. The + Branch class inherits from Symbol and from LineOfDevelopment. + + Tag -- a symbol within a particular Project that will be treated + as a tag in SVN. Usually corresponds to a non-branch tag in + CVS, but might be a branch tag that was mutated in + CollateSymbolsPass. In SVN, this will correspond to a + subdirectory of the project's "tags" directory. The Tags + class inherits from Symbol and from LineOfDevelopment. + + ExcludedSymbol -- a CVS symbol that will be excluded from the + cvs2svn output. + + LineOfDevelopment -- a Trunk, Branch, or Tag. + + +The following terms and the corresponding classes represent particular +CVS events in particular CVS files. For example, the CVSBranch +representing the creation of Branch "foo" in one file will be distinct +from the CVSBranch representing the creation of branch "foo" in +another file, even if the two files are in the same Project. Each +CVSItem is assigned a unique integer ID during CollectRevsPass which +is preserved during the entire conversion (even if, say, a CVSBranch +is mutated into a CVSTag). + + CVSItem -- abstract base class representing any discernible event + within a single RCS file, for example the creation of revision + 1.6, or the tagging of the file with tag "bar". Each CVSItem + has a unique integer ID. + + CVSRevision -- a particular revision within a particular file + (e.g., file.txt:1.6). A CVSRevision occurs on a particular + LineOfDevelopment. CVSRevision inherits from CVSItem. + + CVSSymbol -- a CVSBranch or CVSTag (see below). CVSSymbol + inherits from CVSItem. + + CVSBranch -- the creation of a particular Branch on a particular + file. A CVSBranch has a Symbol instance telling the Symbol + associated with the branch, and also records the + LineOfDevelopment from which the branch was created. In the + SVN repository, a CVSBranch corresponds to an "svn copy" of a + file to a subdirectory of the project's "branches" directory. + CVSBranch inherits from CVSSymbol. + + CVSTag -- the creation of a particular Tag on a particular file. + A CVSTag has a Symbol instance telling the Symbol associated + with the tag, and also records the LineOfDevelopment from + which the tag was created. In the SVN repository, a CVSTag + corresponds to an "svn copy" of a file to a subdirectory of + the project's "tags" directory. CVSTag inherits from + CVSSymbol. + + +CollectRevsPass +=============== + +Collect all information about CVS tags and branches from the CVS +repository. + +For each project, create a Trunk object to represent the trunk line of +development for that project. The Trunk object for one Project is +distinct from the Trunk objects for other Projects. For each symbol +name seen in each project, create a Symbol object. The Symbol object +contains its id, project, and name. + +For each Symbol object, collect the following statistics: + + * In how many files was the symbol used as a branch and in how many + was it used as a tag. + + * In how many files was there a commit on a branch with that name. + + * Which other symbols branched off of a branch with that name. + + * In how many files could each other line of development have served + as the source of this symbol. These are called the "possible + parents" of the symbol. + +These statistics are used in CollateSymbolsPass to determine which +symbols can be excluded or converted from tags to branches or vice +versa. + +The possible parents information is important because CVS is ambiguous +about what line of development was the source of a branch. A branch +numbered 1.3.6 might have been created from trunk (revision 1.3), from +branch 1.3.2, or from branch 1.3.4; it is simply impossible to tell +based on the information in a single RCS file. + +[Actually, the situation is even more confusing. If a branch tag is +deleted from CVS, the branch number is recycled. So it is even +possible that branch 1.3.6 was created from branch 1.3.8 or 1.3.10 or +... We address this confusion by noting the order that the branches +were listed in the RCS file header. It appears that CVS lists +branches in the header in reverse chronological order of creation.] + +For each tag seen within each file, create a CVSTag object recording +its id, CVSFile, Symbol, and the id of the CVSRevision being tagged. + +For each branch seen within each file, create a CVSBranch object +recording an id, CVSFile, Symbol, the branch number (e.g., '1.4.2'), +the id of the CVSRevision from which the branch sprouts, and the id of +the first CVSRevision on the branch (if any). + +For each revision seen within each file, create a CVSRevision object +recording (among other things) and id, the line of development (trunk +or branch) on which the revision appeared, a list of ids of CVSTags +tagging the revision, and a list of ids of CVSBranches sprouting from +the revision. + +This pass also adjusts the CVS dependency tree to work around some CVS +quirks. (See design-notes.txt for the details.) These adjustments +can result in CVSBranches being deleted, for example, if a file was +added on a branch. In such a case, any CVSRevisions that were +previously on the branch will be created by adding the file to the +branch directory, rather than copying the file from the source +directory to the branch directory. + + +CollateSymbolsPass +================== + +Use the symbol statistics collected in CollectRevsPass and +Ctx().symbol_strategy to decide which symbols should be treated as +branches, which as tags, and which symbols should be excluded from the +conversion altogether. Consistency checks prevent, for example, a +branch from being converted into a tag if there were commits on the +branch. + +This pass creates the symbol database, SYMBOL_DB, which is accessed in +later passes via the SymbolDatabase class. The SymbolDatabase +contains TypedSymbol (Branch, Tag, or ExcludedSymbol) instances +indicating how each symbol should be processed in the conversion. The +ID used for a TypedSymbol is the same as the ID allocated to the +corresponding symbol in CollectRevsPass, so references in CVSItems do +not have to be updated. + +This pass also chooses and records the preferred parent for each +Symbol. The preferred parent of a Symbol is the line of development +that appeared as a possible parent of this symbol in the most +CVSFiles. + + +FilterSymbolsPass +================= + +Iterate through all of the CVSItems, mutating CVSTags to CVSBranches +and vice versa and excluding other CVSSymbols as specified by the +types of the TypedSymbols in the SymbolDatabase. Additionally, filter +out any CVSRevisions that reside on excluded CVSBranches. + +Write a line of text to CVS_SYMBOLS_SUMMARY_DATAFILE for each +surviving CVSSymbol, containing its Symbol id and CVSItem id. (This +file will be sorted in SortSymbolSummaryPass then used in +InitializeChangesetsPass to create SymbolChangesets.) + +Also adjust the file's dependency tree by grafting CVSSymbols onto +their preferred parents. This is not always possible; if not, leave +the CVSSymbol where it was. + +Finally, record symbol "openings" and "closings". A CVSSymbol is +considered "opened" by the CVSRevision or CVSBranch from which the +CVSSymbol sprouts. A CVSSymbol is considered "closed" by the +CVSRevision that overwrites or deletes the CVSSymbol's opening. +(Every CVSSymbol has an opening, but not all of them have closings; +for example, the opening CVSRevision might still exist at HEAD.) +Record in each CVSRevision and CVSBranch a list of all of the +CVSSymbols that it opens. Record in each CVSRevision a list of all of +the CVSSymbols that it closes (CVSBranches cannot close CVSSymbols). + + +SortRevisionSummaryPass +======================= + +N/A + + +SortSymbolSummaryPass +===================== + +Sort CVS_SYMBOLS_SUMMARY_DATAFILE, creating +CVS_SYMBOLS_SUMMARY_SORTED_DATAFILE. The sort groups together symbol +items that might be added to the same SymbolChangeset. + + +InitializeChangesetsPass +======================== + +Read CVS_SYMBOLS_SUMMARY_SORTED_DATAFILE, grouping CVSSymbol items +with the same Symbol id into SymbolChangesets. The SymbolChangesets +are currently unused. + + +BreakCVSRevisionChangesetCyclesPass +================================== + +N/A + + +RevisionTopologicalSortPass +=========================== + +N/A + + +BreakCVSSymbolChangesetCyclesPass +================================ + +Read in the changeset graph consisting only of SymbolChangesets and +break any cycles that are found by breaking up symbol changesets. + + +BreakAllChangesetCyclesPass +================================ + +Read in the entire changeset graph and break any cycles that are found +by breaking up symbol changesets. + + +TopologicalSortPass +=================== + +Update the conversion statistics with excluded symbols omitted. + + +CreateRevsPass +============== + +Create SVNCommits and assign svn revision numbers to each one. Create +a database (SVN_COMMITS_DB) to map svn revision numbers to SVNCommits +and another (CVS_REVS_TO_SVN_REVNUMS) to map each CVSRevision id to +the number of the svn revision containing it. + +Also, SymbolingsLogger writes a line to SYMBOL_OPENINGS_CLOSINGS for +each opening or closing for each CVSSymbol, noting in what SVN +revision the opening or closing occurred. + + +SortSymbolsPass +=============== + +This pass sorts SYMBOL_OPENINGS_CLOSINGS into +SYMBOL_OPENINGS_CLOSINGS_SORTED. This orders the file first by symbol +ID, and second by Subversion revision number, thus grouping all +openings and closings for each symbolic name together. + + +IndexSymbolsPass +================ + +Iterate through all the lines in SYMBOL_OPENINGS_CLOSINGS_SORTED, +writing out a pickled map to SYMBOL_OFFSETS_DB telling at what offset +in SYMBOL_OPENINGS_CLOSINGS_SORTED the lines corresponding to each +Symbol begin. This will allow us to seek to the various offsets in +the file and sequentially read only the openings and closings that we +need. + + +OutputPass +========== + +The filling of a symbol is triggered when SVNSymbolCommit.commit() +calls SVNRepositoryMirror.fill_symbol(). The SVNSymbolCommit contains +the list of CVSSymbols that have to be copied to a symbol directory in +this revision. However, we still have to do a lot of work to figure +out what SVN revision number to use as the source of these copies, and +also to group file copies together into directory copies when +possible. + +The SYMBOL_OPENINGS_CLOSINGS_SORTED file lists the opening and closing +SVN revision of each revision that has to be copied to the symbol +directory. We use this information to try to find SVN revision +numbers that can serve as the source for as many files as possible, to +avoid having to pick and choose sources from many SVN revisions. + +Furthermore, when a bunch of files in a directory have to be copied at +the same time, it is cheaper to copy the directory as a whole. But if +not *all* of the files within the directory had to be copied, then the +unneeded files have to be deleted again from the copied directory. Or +if some of the files have to be copied from different source SVN +revision numbers, then those files have to be overwritten in the +copied directory with the correct versions. + +Finally, it can happen that a single Symbol has to be filled multiple +times (because the initial SymbolChangeset had to be broken up). In +this case, the first fill can copy the source directory to the +destination directory (maybe with fixups), but subsequent copies have +to copy individual files to avoid overwriting content that is already +present in the destination directory. + +To figure all of this out, we need to know all of the files that +existed in every previous SVN revision, in every line of development. +This is done using the SVNRepositoryMirror class, which keeps a +skeleton record of the entire SVN history in a database using data +structures similar to those used by SVN itself. + + diff -purNbBwx .svn cvs2svn-1.5.x/profile-repos.py cvs2svn-2.0.0/profile-repos.py --- cvs2svn-1.5.x/profile-repos.py 2006-09-10 19:43:25.000000000 +0200 +++ cvs2svn-2.0.0/profile-repos.py 2007-08-15 22:53:54.000000000 +0200 @@ -1,6 +1,6 @@ #!/usr/bin/env python # ==================================================================== -# Copyright (c) 2000-2004 CollabNet. All rights reserved. +# Copyright (c) 2000-2006 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -22,7 +22,7 @@ on that repository. NOTE: You have to r import sys, os, os.path -from cvs2svn_lib.database import DB_OPEN_READ +from cvs2svn_lib.common import DB_OPEN_READ from cvs2svn_lib.config import CVS_FILES_DB from cvs2svn_lib.config import CVS_ITEMS_DB from cvs2svn_lib.config import CVS_ITEMS_ALL_DATAFILE diff -purNbBwx .svn cvs2svn-1.5.x/run-tests.py cvs2svn-2.0.0/run-tests.py --- cvs2svn-1.5.x/run-tests.py 2006-10-03 12:33:21.000000000 +0200 +++ cvs2svn-2.0.0/run-tests.py 2007-08-15 22:53:54.000000000 +0200 @@ -23,7 +23,7 @@ # See http://subversion.tigris.org for more information. # # ==================================================================== -# Copyright (c) 2000-2004 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -57,10 +57,10 @@ if not (os.path.exists('cvs2svn') and os # Load the Subversion test framework. import svntest - -# Abbreviations -Skip = svntest.testcase.Skip -XFail = svntest.testcase.XFail +from svntest import Failure +from svntest.testcase import TestCase +from svntest.testcase import Skip +from svntest.testcase import XFail cvs2svn = os.path.abspath('cvs2svn') @@ -72,7 +72,7 @@ svn = 'svn' svnlook = 'svnlook' test_data_dir = 'test-data' -tmp_dir = 'tmp' +tmp_dir = 'cvs2svn-tmp' #---------------------------------------------------------------------- @@ -80,12 +80,21 @@ tmp_dir = 'tmp' #---------------------------------------------------------------------- -class RunProgramException: - pass +# The value to expect for svn:keywords if it is set: +KEYWORDS = 'Author Date Id Revision' -class MissingErrorException: + +class RunProgramException(Failure): pass + +class MissingErrorException(Failure): + def __init__(self, error_re): + Failure.__init__( + self, "Test failed because no error matched '%s'" % (error_re,) + ) + + def run_program(program, error_re, *varargs): """Run PROGRAM with VARARGS, return stdout as a list of lines. If there is any stderr and ERROR_RE is None, raise @@ -101,14 +110,14 @@ def run_program(program, error_re, *vara for line in err: if re.match(error_re, line): return out - raise MissingErrorException + raise MissingErrorException(error_re) else: if svntest.main.verbose_mode: print '\n%s said:\n' % program for line in err: print ' ' + line, print - raise RunProgramException + raise RunProgramException() return out @@ -218,7 +227,7 @@ class Log: def check_msg(self, msg): """Verify that this Log's message starts with the specified MSG.""" if self.msg.find(msg) != 0: - raise svntest.Failure( + raise Failure( "Revision %d log message was:\n%s\n\n" "It should have begun with:\n%s\n\n" % (self.revision, self.msg, msg,) @@ -233,13 +242,13 @@ class Log: path = path % self.symbols found_op = self.changed_paths.get(path, None) if found_op is None: - raise svntest.Failure( + raise Failure( "Revision %d does not include change for path %s " "(it should have been %s).\n" % (self.revision, path, op,) ) if found_op != op: - raise svntest.Failure( + raise Failure( "Revision %d path %s had op %s (it should have been %s)\n" % (self.revision, path, found_op, op,) ) @@ -256,7 +265,7 @@ class Log: cp[path % self.symbols] = op if self.changed_paths != cp: - raise svntest.Failure( + raise Failure( "Revision %d changed paths list was:\n%s\n\n" "It should have been:\n%s\n\n" % (self.revision, self.changed_paths, cp,) @@ -429,6 +438,8 @@ class Conversion: symbols -- a dictionary of symbols used for string interpolation in path names. + stdout -- a list of lines written by cvs2svn to stdout + _wc -- the basename of the svn working copy (within tmp_dir). _wc_path -- the path to the svn working copy, if it has already @@ -441,6 +452,20 @@ class Conversion: _svnrepos -- the basename of the svn repository (within tmp_dir).""" + # The number of the last cvs2svn pass (determined lazily by + # get_last_pass()). + last_pass = None + + def get_last_pass(cls): + """Return the number of cvs2svn's last pass.""" + + if cls.last_pass is None: + out = run_cvs2svn(None, '--help-passes') + cls.last_pass = int(out[-1].split()[0]) + return cls.last_pass + + get_last_pass = classmethod(get_last_pass) + def __init__(self, conv_id, name, error_re, passbypass, symbols, args, options_file=None): self.conv_id = conv_id @@ -462,9 +487,12 @@ class Conversion: if options_file is None: self.options_file = None + if tmp_dir != 'cvs2svn-tmp': + # Only include this argument if it differs from cvs2svn's default: args.extend([ '--tmpdir=%s' % tmp_dir, - '--bdb-txn-nosync', + ]) + args.extend([ '-s', self.repos, cvsrepos, ]) @@ -474,23 +502,34 @@ class Conversion: '--options=%s' % self.options_file, ]) - try: if passbypass: - for p in range(1, 10): - run_cvs2svn(error_re, '-p', str(p), *args) + self.stdout = [] + for p in range(1, self.get_last_pass() + 1): + self.stdout += run_cvs2svn(error_re, '-p', str(p), *args) else: - run_cvs2svn(error_re, *args) - except RunProgramException: - raise svntest.Failure - except MissingErrorException: - raise svntest.Failure("Test failed because no error matched '%s'" - % error_re) + self.stdout = run_cvs2svn(error_re, *args) - if not os.path.isdir(self.repos): - raise svntest.Failure("Repository not created: '%s'" + if os.path.isdir(self.repos): + self.logs = parse_log(self.repos, self.symbols) + elif error_re is None: + raise Failure("Repository not created: '%s'" % os.path.join(os.getcwd(), self.repos)) - self.logs = parse_log(self.repos, self.symbols) + + def output_found(self, pattern): + """Return True if PATTERN matches any line in self.stdout. + + PATTERN is a regular expression pattern as a string. + """ + + pattern_re = re.compile(pattern) + + for line in self.stdout: + if pattern_re.match(line): + # We found the pattern that we were looking for. + return 1 + else: + return 0 def find_tag_log(self, tagname): """Search LOGS for a log message containing 'TAGNAME' and return the @@ -540,7 +579,7 @@ class Conversion: props = props_for_path(self.get_wc_tree(), file) for i in range(len(keys)): if props.get(keys[i]) != values[i]: - raise svntest.Failure( + raise Failure( "File %s has property %s set to \"%s\" " "(it should have been \"%s\").\n" % (file, keys[i], props.get(keys[i]), values[i],) @@ -563,15 +602,15 @@ def ensure_conversion(name, error_re=Non If ERROR_RE is a string, it is a regular expression expected to match some line of stderr printed by the conversion. If there is an - error and ERROR_RE is not set, then raise svntest.Failure. + error and ERROR_RE is not set, then raise Failure. If PASSBYPASS is set, then cvs2svn is run multiple times, each time with a -p option starting at 1 and increasing to a (hardcoded) maximum. NAME is just one word. For example, 'main' would mean to convert './test-data/main-cvsrepos', and after the conversion, the resulting - Subversion repository would be in './tmp/main-svnrepos', and a - checked out head working copy in './tmp/main-wc'. + Subversion repository would be in './cvs2svn-tmp/main-svnrepos', and + a checked out head working copy in './cvs2svn-tmp/main-wc'. Any other options to pass to cvs2svn should be in ARGS, each element being one option, e.g., '--trunk-only'. If the option takes an @@ -613,7 +652,7 @@ def ensure_conversion(name, error_re=Non conv_id, name, error_re, passbypass, {'trunk' : trunk, 'branches' : branches, 'tags' : tags}, args, options_file) - except svntest.Failure: + except Failure: # Remember the failure so that a future attempt to run this conversion # does not bother to retry, but fails immediately. already_converted[conv_id] = None @@ -621,10 +660,79 @@ def ensure_conversion(name, error_re=Non conv = already_converted[conv_id] if conv is None: - raise svntest.Failure + raise Failure() return conv +class Cvs2SvnTestCase(TestCase): + def __init__(self, name, description=None, variant=None, + error_re=None, passbypass=None, + trunk=None, branches=None, tags=None, + args=None, options_file=None): + TestCase.__init__(self) + self.name = name + + if description is not None: + self._description = description + else: + # By default, use the first line of the class docstring as the + # description: + self._description = self.__doc__.splitlines()[0] + + # Check that the original description is OK before we tinker with + # it: + self.check_description() + + if variant is not None: + # Modify description to show the variant. Trim description + # first if necessary to stay within the 50-character limit. + suffix = '...variant %s' % (variant,) + self._description = self._description[:50 - len(suffix)] + suffix + # Check that the description is still OK: + self.check_description() + + self.error_re = error_re + self.passbypass = passbypass + self.trunk = trunk + self.branches = branches + self.tags = tags + self.args = args + self.options_file = options_file + + def get_description(self): + return self._description + + def ensure_conversion(self): + return ensure_conversion( + self.name, + error_re=self.error_re, passbypass=self.passbypass, + trunk=self.trunk, branches=self.branches, tags=self.tags, + args=self.args, options_file=self.options_file) + + +class Cvs2SvnPropertiesTestCase(Cvs2SvnTestCase): + """Test properties resulting from a conversion.""" + + def __init__(self, name, props_to_test, expected_props, **kw): + """Initialize an instance of Cvs2SvnPropertiesTestCase. + + NAME is the name of the test, passed to Cvs2SvnTestCase. + PROPS_TO_TEST is a list of the names of svn properties that should + be tested. EXPECTED_PROPS is a list of tuples [(filename, + [value,...])], where the second item in each tuple is a list of + values expected for the properties listed in PROPS_TO_TEST for the + specified filename. If a property must *not* be set, then its + value should be listed as None.""" + + Cvs2SvnTestCase.__init__(self, name, **kw) + self.props_to_test = props_to_test + self.expected_props = expected_props + + def run(self): + conv = self.ensure_conversion() + conv.check_props(self.props_to_test, self.expected_props) + + #---------------------------------------------------------------------- # Tests. #---------------------------------------------------------------------- @@ -638,32 +746,32 @@ def show_usage(): print 'cvs2svn cannot execute due to lack of proper DBM module.' print 'Exiting without running any further tests.' sys.exit(1) - if out[0].find('USAGE') < 0: - raise svntest.Failure('Basic cvs2svn invocation failed.') + if out[0].find('Usage:') < 0: + raise Failure('Basic cvs2svn invocation failed.') def show_help_passes(): "cvs2svn --help-passes shows pass information" out = run_cvs2svn(None, '--help-passes') if out[0].find('PASSES') < 0: - raise svntest.Failure('cvs2svn --help-passes failed.') + raise Failure('cvs2svn --help-passes failed.') def attr_exec(): "detection of the executable flag" if sys.platform == 'win32': - raise svntest.Skip + raise svntest.Skip() conv = ensure_conversion('main') st = os.stat(conv.get_wc('trunk', 'single-files', 'attr-exec')) if not st[0] & stat.S_IXUSR: - raise svntest.Failure + raise Failure() def space_fname(): "conversion of filename with a space" conv = ensure_conversion('main') if not conv.path_exists('trunk', 'single-files', 'space fname'): - raise svntest.Failure + raise Failure() def two_quick(): @@ -672,11 +780,16 @@ def two_quick(): logs = parse_log( os.path.join(conv.repos, 'trunk', 'single-files', 'twoquick'), {}) if len(logs) != 2: - raise svntest.Failure + raise Failure() -def prune_with_care(**kw): +class PruneWithCare(Cvs2SvnTestCase): "prune, but never too much" + + def __init__(self, **kw): + Cvs2SvnTestCase.__init__(self, 'main', **kw) + + def run(self): # Robert Pluim encountered this lovely one while converting the # directory src/gnu/usr.bin/cvs/contrib/pcl-cvs/ in FreeBSD's CVS # repository (see issue #1302). Step 4 is the doozy: @@ -710,7 +823,7 @@ def prune_with_care(**kw): # In the test below, 'trunk/full-prune/first' represents # cookie, and 'trunk/full-prune/second' represents NEWS. - conv = ensure_conversion('main', **kw) + conv = self.ensure_conversion() # Confirm that revision 4 removes '/trunk/full-prune/first', # and that revision 6 removes '/trunk/full-prune'. @@ -724,29 +837,19 @@ def prune_with_care(**kw): # pruning from going farther than the subdirectory containing first # and second. - rev = 11 - for path in ('/%(trunk)s/full-prune/first', - '/%(trunk)s/full-prune-reappear/sub/first', - '/%(trunk)s/partial-prune/sub/first'): - conv.logs[rev].check_change(path, 'D') - - rev = 13 - for path in ('/%(trunk)s/full-prune', - '/%(trunk)s/full-prune-reappear', - '/%(trunk)s/partial-prune/sub'): - conv.logs[rev].check_change(path, 'D') - - rev = 47 - for path in ('/%(trunk)s/full-prune-reappear', - '/%(trunk)s/full-prune-reappear/appears-later'): - conv.logs[rev].check_change(path, 'A') - - -def prune_with_care_variants(): - "prune, with alternate repo layout" - prune_with_care(trunk='a', branches='b', tags='c') - prune_with_care(trunk='a/1', branches='b/1', tags='c/1') - prune_with_care(trunk='a/1', branches='a/2', tags='a/3') + for path in ('full-prune/first', + 'full-prune-reappear/sub/first', + 'partial-prune/sub/first'): + conv.logs[8].check_change('/%(trunk)s/' + path, 'D') + + for path in ('full-prune', + 'full-prune-reappear', + 'partial-prune/sub'): + conv.logs[10].check_change('/%(trunk)s/' + path, 'D') + + for path in ('full-prune-reappear', + 'full-prune-reappear/appears-later'): + conv.logs[40].check_change('/%(branches)s/vendorbranch/' + path, 'A') def interleaved_commits(): @@ -755,19 +858,19 @@ def interleaved_commits(): conv = ensure_conversion('main') # The initial import. - rev = 37 - conv.logs[rev].check('Initial revision', ( - ('/%(trunk)s/interleaved', 'A'), - ('/%(trunk)s/interleaved/1', 'A'), - ('/%(trunk)s/interleaved/2', 'A'), - ('/%(trunk)s/interleaved/3', 'A'), - ('/%(trunk)s/interleaved/4', 'A'), - ('/%(trunk)s/interleaved/5', 'A'), - ('/%(trunk)s/interleaved/a', 'A'), - ('/%(trunk)s/interleaved/b', 'A'), - ('/%(trunk)s/interleaved/c', 'A'), - ('/%(trunk)s/interleaved/d', 'A'), - ('/%(trunk)s/interleaved/e', 'A'), + rev = 32 + conv.logs[rev].check('Initial import.', ( + ('/%(branches)s/vendorbranch/interleaved', 'A'), + ('/%(branches)s/vendorbranch/interleaved/1', 'A'), + ('/%(branches)s/vendorbranch/interleaved/2', 'A'), + ('/%(branches)s/vendorbranch/interleaved/3', 'A'), + ('/%(branches)s/vendorbranch/interleaved/4', 'A'), + ('/%(branches)s/vendorbranch/interleaved/5', 'A'), + ('/%(branches)s/vendorbranch/interleaved/a', 'A'), + ('/%(branches)s/vendorbranch/interleaved/b', 'A'), + ('/%(branches)s/vendorbranch/interleaved/c', 'A'), + ('/%(branches)s/vendorbranch/interleaved/d', 'A'), + ('/%(branches)s/vendorbranch/interleaved/e', 'A'), )) # This PEP explains why we pass the 'log' parameter to these two @@ -797,11 +900,11 @@ def interleaved_commits(): # One of the commits was letters only, the other was numbers only. # But they happened "simultaneously", so we don't assume anything # about which commit appeared first, so we just try both ways. - rev = rev + 3 + rev += 2 try: check_letters(conv.logs[rev]) check_numbers(conv.logs[rev + 1]) - except svntest.Failure: + except Failure: check_numbers(conv.logs[rev]) check_letters(conv.logs[rev + 1]) @@ -812,34 +915,31 @@ def simple_commits(): conv = ensure_conversion('main') # The initial import. - rev = 23 - conv.logs[rev].check('Initial revision', ( - ('/%(trunk)s/proj', 'A'), - ('/%(trunk)s/proj/default', 'A'), - ('/%(trunk)s/proj/sub1', 'A'), - ('/%(trunk)s/proj/sub1/default', 'A'), - ('/%(trunk)s/proj/sub1/subsubA', 'A'), - ('/%(trunk)s/proj/sub1/subsubA/default', 'A'), - ('/%(trunk)s/proj/sub1/subsubB', 'A'), - ('/%(trunk)s/proj/sub1/subsubB/default', 'A'), - ('/%(trunk)s/proj/sub2', 'A'), - ('/%(trunk)s/proj/sub2/default', 'A'), - ('/%(trunk)s/proj/sub2/subsubA', 'A'), - ('/%(trunk)s/proj/sub2/subsubA/default', 'A'), - ('/%(trunk)s/proj/sub3', 'A'), - ('/%(trunk)s/proj/sub3/default', 'A'), + conv.logs[18].check('Initial import.', ( + ('/%(branches)s/vendorbranch/proj', 'A'), + ('/%(branches)s/vendorbranch/proj/default', 'A'), + ('/%(branches)s/vendorbranch/proj/sub1', 'A'), + ('/%(branches)s/vendorbranch/proj/sub1/default', 'A'), + ('/%(branches)s/vendorbranch/proj/sub1/subsubA', 'A'), + ('/%(branches)s/vendorbranch/proj/sub1/subsubA/default', 'A'), + ('/%(branches)s/vendorbranch/proj/sub1/subsubB', 'A'), + ('/%(branches)s/vendorbranch/proj/sub1/subsubB/default', 'A'), + ('/%(branches)s/vendorbranch/proj/sub2', 'A'), + ('/%(branches)s/vendorbranch/proj/sub2/default', 'A'), + ('/%(branches)s/vendorbranch/proj/sub2/subsubA', 'A'), + ('/%(branches)s/vendorbranch/proj/sub2/subsubA/default', 'A'), + ('/%(branches)s/vendorbranch/proj/sub3', 'A'), + ('/%(branches)s/vendorbranch/proj/sub3/default', 'A'), )) # The first commit. - rev = 30 - conv.logs[rev].check('First commit to proj, affecting two files.', ( + conv.logs[24].check('First commit to proj, affecting two files.', ( ('/%(trunk)s/proj/sub1/subsubA/default', 'M'), ('/%(trunk)s/proj/sub3/default', 'M'), )) # The second commit. - rev = 31 - conv.logs[rev].check('Second commit to proj, affecting all 7 files.', ( + conv.logs[25].check('Second commit to proj, affecting all 7 files.', ( ('/%(trunk)s/proj/default', 'M'), ('/%(trunk)s/proj/sub1/default', 'M'), ('/%(trunk)s/proj/sub1/subsubA/default', 'M'), @@ -850,66 +950,79 @@ def simple_commits(): )) -def simple_tags(**kw): - "simple tags and branches with no commits" +class SimpleTags(Cvs2SvnTestCase): + "simple tags and branches, no commits" + + def __init__(self, **kw): # See test-data/main-cvsrepos/proj/README. - conv = ensure_conversion('main', **kw) + Cvs2SvnTestCase.__init__(self, 'main', **kw) + + def run(self): + conv = self.ensure_conversion() # Verify the copy source for the tags we are about to check # No need to verify the copyfrom revision, as simple_commits did that - conv.logs[24].check(sym_log_msg('vendorbranch'), ( - ('/%(branches)s/vendorbranch/proj (from /%(trunk)s/proj:23)', 'A'), + conv.logs[19].check('This commit was generated by cvs2svn ' + 'to compensate for changes in r18,', ( + ('/%(trunk)s/proj', 'A'), + ('/%(trunk)s/proj/default ' + '(from /%(branches)s/vendorbranch/proj/default:18)', 'A'), + ('/%(trunk)s/proj/sub1', 'A'), + ('/%(trunk)s/proj/sub1/default ' + '(from /%(branches)s/vendorbranch/proj/sub1/default:18)', 'A'), + ('/%(trunk)s/proj/sub1/subsubA', 'A'), + ('/%(trunk)s/proj/sub1/subsubA/default ' + '(from /%(branches)s/vendorbranch/proj/sub1/subsubA/default:18)', 'A'), + ('/%(trunk)s/proj/sub1/subsubB', 'A'), + ('/%(trunk)s/proj/sub1/subsubB/default ' + '(from /%(branches)s/vendorbranch/proj/sub1/subsubB/default:18)', 'A'), + ('/%(trunk)s/proj/sub2', 'A'), + ('/%(trunk)s/proj/sub2/default ' + '(from /%(branches)s/vendorbranch/proj/sub2/default:18)', 'A'), + ('/%(trunk)s/proj/sub2/subsubA', 'A'), + ('/%(trunk)s/proj/sub2/subsubA/default ' + '(from /%(branches)s/vendorbranch/proj/sub2/subsubA/default:18)', 'A'), + ('/%(trunk)s/proj/sub3', 'A'), + ('/%(trunk)s/proj/sub3/default ' + '(from /%(branches)s/vendorbranch/proj/sub3/default:18)', 'A'), )) - fromstr = ' (from /%(branches)s/vendorbranch:25)' + fromstr = ' (from /%(branches)s/B_FROM_INITIALS:20)' # Tag on rev 1.1.1.1 of all files in proj + conv.logs[20].check(sym_log_msg('B_FROM_INITIALS'), ( + ('/%(branches)s/B_FROM_INITIALS (from /%(branches)s/vendorbranch:18)', + 'A'), + ('/%(branches)s/B_FROM_INITIALS/single-files', 'D'), + ('/%(branches)s/B_FROM_INITIALS/partial-prune', 'D'), + )) + + # The same, as a tag log = conv.find_tag_log('T_ALL_INITIAL_FILES') log.check(sym_log_msg('T_ALL_INITIAL_FILES',1), ( ('/%(tags)s/T_ALL_INITIAL_FILES'+fromstr, 'A'), - ('/%(tags)s/T_ALL_INITIAL_FILES/single-files', 'D'), - ('/%(tags)s/T_ALL_INITIAL_FILES/partial-prune', 'D'), - )) - - # The same, as a branch - conv.logs[26].check(sym_log_msg('B_FROM_INITIALS'), ( - ('/%(branches)s/B_FROM_INITIALS'+fromstr, 'A'), - ('/%(branches)s/B_FROM_INITIALS/single-files', 'D'), - ('/%(branches)s/B_FROM_INITIALS/partial-prune', 'D'), )) # Tag on rev 1.1.1.1 of all files in proj, except one log = conv.find_tag_log('T_ALL_INITIAL_FILES_BUT_ONE') log.check(sym_log_msg('T_ALL_INITIAL_FILES_BUT_ONE',1), ( ('/%(tags)s/T_ALL_INITIAL_FILES_BUT_ONE'+fromstr, 'A'), - ('/%(tags)s/T_ALL_INITIAL_FILES_BUT_ONE/single-files', 'D'), - ('/%(tags)s/T_ALL_INITIAL_FILES_BUT_ONE/partial-prune', 'D'), ('/%(tags)s/T_ALL_INITIAL_FILES_BUT_ONE/proj/sub1/subsubB', 'D'), )) # The same, as a branch - conv.logs[27].check(sym_log_msg('B_FROM_INITIALS_BUT_ONE'), ( + conv.logs[23].check(sym_log_msg('B_FROM_INITIALS_BUT_ONE'), ( ('/%(branches)s/B_FROM_INITIALS_BUT_ONE'+fromstr, 'A'), - ('/%(branches)s/B_FROM_INITIALS_BUT_ONE/single-files', 'D'), - ('/%(branches)s/B_FROM_INITIALS_BUT_ONE/partial-prune', 'D'), ('/%(branches)s/B_FROM_INITIALS_BUT_ONE/proj/sub1/subsubB', 'D'), )) -def simple_tags_variants(): - "simple tags, with alternate repo layout" - simple_tags(trunk='a', branches='b', tags='c') - simple_tags(trunk='a/1', branches='b/1', tags='c/1') - simple_tags(trunk='a/1', branches='a/2', tags='a/3') - - def simple_branch_commits(): "simple branch commits" # See test-data/main-cvsrepos/proj/README. conv = ensure_conversion('main') - rev = 35 - conv.logs[rev].check('Modify three files, on branch B_MIXED.', ( + conv.logs[29].check('Modify three files, on branch B_MIXED.', ( ('/%(branches)s/B_MIXED/proj/default', 'M'), ('/%(branches)s/B_MIXED/proj/sub1/default', 'M'), ('/%(branches)s/B_MIXED/proj/sub2/subsubA/default', 'M'), @@ -922,17 +1035,9 @@ def mixed_time_tag(): conv = ensure_conversion('main') log = conv.find_tag_log('T_MIXED') - expected = ( - ('/%(tags)s/T_MIXED (from /%(trunk)s:31)', 'A'), - ('/%(tags)s/T_MIXED/partial-prune', 'D'), - ('/%(tags)s/T_MIXED/single-files', 'D'), - ('/%(tags)s/T_MIXED/proj/sub2/subsubA ' - '(from /%(trunk)s/proj/sub2/subsubA:23)', 'R'), - ('/%(tags)s/T_MIXED/proj/sub3 (from /%(trunk)s/proj/sub3:30)', 'R'), - ) - if log.revision == 16: - expected.append(('/%(tags)s', 'A')) - log.check_changes(expected) + log.check_changes(( + ('/%(tags)s/T_MIXED (from /%(branches)s/B_MIXED:26)', 'A'), + )) def mixed_time_branch_with_added_file(): @@ -942,16 +1047,16 @@ def mixed_time_branch_with_added_file(): # A branch from the same place as T_MIXED in the previous test, # plus a file added directly to the branch - conv.logs[32].check(sym_log_msg('B_MIXED'), ( - ('/%(branches)s/B_MIXED (from /%(trunk)s:31)', 'A'), + conv.logs[26].check(sym_log_msg('B_MIXED'), ( + ('/%(branches)s/B_MIXED (from /%(trunk)s:25)', 'A'), ('/%(branches)s/B_MIXED/partial-prune', 'D'), ('/%(branches)s/B_MIXED/single-files', 'D'), ('/%(branches)s/B_MIXED/proj/sub2/subsubA ' - '(from /%(trunk)s/proj/sub2/subsubA:23)', 'R'), - ('/%(branches)s/B_MIXED/proj/sub3 (from /%(trunk)s/proj/sub3:30)', 'R'), + '(from /%(branches)s/vendorbranch/proj/sub2/subsubA:25)', 'R'), + ('/%(branches)s/B_MIXED/proj/sub3 (from /%(trunk)s/proj/sub3:24)', 'R'), )) - conv.logs[34].check('Add a file on branch B_MIXED.', ( + conv.logs[28].check('Add a file on branch B_MIXED.', ( ('/%(branches)s/B_MIXED/proj/sub2/branch_B_MIXED_only', 'A'), )) @@ -961,7 +1066,7 @@ def mixed_commit(): # See test-data/main-cvsrepos/proj/README. conv = ensure_conversion('main') - conv.logs[36].check( + conv.logs[30].check( 'A single commit affecting one file on branch B_MIXED ' 'and one on trunk.', ( ('/%(trunk)s/proj/sub2/default', 'M'), @@ -974,16 +1079,15 @@ def split_time_branch(): # See test-data/main-cvsrepos/proj/README. conv = ensure_conversion('main') - rev = 42 # First change on the branch, creating it - conv.logs[rev].check(sym_log_msg('B_SPLIT'), ( - ('/%(branches)s/B_SPLIT (from /%(trunk)s:36)', 'A'), + conv.logs[31].check(sym_log_msg('B_SPLIT'), ( + ('/%(branches)s/B_SPLIT (from /%(trunk)s:30)', 'A'), ('/%(branches)s/B_SPLIT/partial-prune', 'D'), ('/%(branches)s/B_SPLIT/single-files', 'D'), ('/%(branches)s/B_SPLIT/proj/sub1/subsubB', 'D'), )) - conv.logs[rev + 1].check('First change on branch B_SPLIT.', ( + conv.logs[36].check('First change on branch B_SPLIT.', ( ('/%(branches)s/B_SPLIT/proj/default', 'M'), ('/%(branches)s/B_SPLIT/proj/sub1/default', 'M'), ('/%(branches)s/B_SPLIT/proj/sub1/subsubA/default', 'M'), @@ -992,18 +1096,18 @@ def split_time_branch(): )) # A trunk commit for the file which was not branched - conv.logs[rev + 2].check('A trunk change to sub1/subsubB/default. ' + conv.logs[37].check('A trunk change to sub1/subsubB/default. ' 'This was committed about an', ( ('/%(trunk)s/proj/sub1/subsubB/default', 'M'), )) # Add the file not already branched to the branch, with modification:w - conv.logs[rev + 3].check(sym_log_msg('B_SPLIT'), ( + conv.logs[38].check(sym_log_msg('B_SPLIT'), ( ('/%(branches)s/B_SPLIT/proj/sub1/subsubB ' - '(from /%(trunk)s/proj/sub1/subsubB:44)', 'A'), + '(from /%(trunk)s/proj/sub1/subsubB:37)', 'A'), )) - conv.logs[rev + 4].check('This change affects sub3/default and ' + conv.logs[39].check('This change affects sub3/default and ' 'sub1/subsubB/default, on branch', ( ('/%(branches)s/B_SPLIT/proj/sub1/subsubB/default', 'M'), ('/%(branches)s/B_SPLIT/proj/sub3/default', 'M'), @@ -1014,10 +1118,10 @@ def multiple_tags(): "multiple tags referring to same revision" conv = ensure_conversion('main') if not conv.path_exists('tags', 'T_ALL_INITIAL_FILES', 'proj', 'default'): - raise svntest.Failure + raise Failure() if not conv.path_exists( 'tags', 'T_ALL_INITIAL_FILES_BUT_ONE', 'proj', 'default'): - raise svntest.Failure + raise Failure() def bogus_tag(): "conversion of invalid symbolic names" @@ -1028,40 +1132,40 @@ def overlapping_branch(): "ignore a file with a branch with two names" conv = ensure_conversion('overlapping-branch', error_re='.*cannot also have name \'vendorB\'') - rev = 4 - conv.logs[rev].check_change('/%(branches)s/vendorA (from /%(trunk)s:3)', - 'A') - # We don't know what order the first two commits would be in, since - # they have different log messages but the same timestamps. As only - # one of the files would be on the vendorB branch in the regression - # case being tested here, we allow for either order. - if (conv.logs[rev].get_path_op( - '/%(branches)s/vendorB (from /%(trunk)s:2)') == 'A' - or conv.logs[rev].get_path_op( - '/%(branches)s/vendorB (from /%(trunk)s:3)') == 'A'): - raise svntest.Failure - conv.logs[rev + 1].check_changes(()) - if len(conv.logs) != rev + 1: - raise svntest.Failure + conv.logs[2].check('imported', ( + ('/%(branches)s/vendorA', 'A'), + ('/%(branches)s/vendorA/nonoverlapping-branch', 'A'), + ('/%(branches)s/vendorA/overlapping-branch', 'A'), + )) + + conv.logs[3].check('This commit was generated by cvs2svn ' + 'to compensate for changes in r2,', ( + ('/%(trunk)s/nonoverlapping-branch ' + '(from /%(branches)s/vendorA/nonoverlapping-branch:2)', 'A'), + ('/%(trunk)s/overlapping-branch ' + '(from /%(branches)s/vendorA/overlapping-branch:2)', 'A'), + )) + if len(conv.logs) != 3: + raise Failure() -def phoenix_branch(**kw): +class PhoenixBranch(Cvs2SvnTestCase): "convert a branch file rooted in a 'dead' revision" - conv = ensure_conversion('phoenix', **kw) - conv.logs[8].check(sym_log_msg('volsung_20010721'), ( - ('/%(branches)s/volsung_20010721 (from /%(trunk)s:7)', 'A'), - ('/%(branches)s/volsung_20010721/file.txt', 'D'), + + def __init__(self, **kw): + Cvs2SvnTestCase.__init__(self, 'phoenix', **kw) + + def run(self): + conv = self.ensure_conversion() + conv.logs[8].check('This file was supplied by Jack Moffitt', ( + ('/%(branches)s/volsung_20010721', 'A'), + ('/%(branches)s/volsung_20010721/phoenix', 'A'), )) conv.logs[9].check('This file was supplied by Jack Moffitt', ( - ('/%(branches)s/volsung_20010721/phoenix', 'A'), + ('/%(branches)s/volsung_20010721/phoenix', 'M'), )) -def phoenix_branch_variants(): - "'dead' revision, with alternate repo layout" - phoenix_branch(trunk='a/1', branches='b/1', tags='c/1') - - ###TODO: We check for 4 changed paths here to accomodate creating tags ###and branches in rev 1, but that will change, so this will ###eventually change back. @@ -1074,7 +1178,7 @@ def ctrl_char_in_log(): ('/%(trunk)s/ctrl-char-in-log', 'A'), )) if conv.logs[rev].msg.find('\x04') < 0: - raise svntest.Failure( + raise Failure( "Log message of 'ctrl-char-in-log,v' (rev 2) is wrong.") @@ -1083,20 +1187,18 @@ def overdead(): conv = ensure_conversion('overdead') -def no_trunk_prune(**kw): +class NoTrunkPrune(Cvs2SvnTestCase): "ensure that trunk doesn't get pruned" - conv = ensure_conversion('overdead', **kw) + + def __init__(self, **kw): + Cvs2SvnTestCase.__init__(self, 'overdead', **kw) + + def run(self): + conv = self.ensure_conversion() for rev in conv.logs.keys(): rev_logs = conv.logs[rev] if rev_logs.get_path_op('/%(trunk)s') == 'D': - raise svntest.Failure - - -def no_trunk_prune_variants(): - "no trunk pruning, with alternate repo layout" - no_trunk_prune(trunk='a', branches='b', tags='c') - no_trunk_prune(trunk='a/1', branches='b/1', tags='c/1') - no_trunk_prune(trunk='a/1', branches='a/2', tags='a/3') + raise Failure() def double_delete(): @@ -1111,14 +1213,14 @@ def double_delete(): path = '/%(trunk)s/twice-removed' rev = 2 - conv.logs[rev].check_change(path, 'A') - conv.logs[rev].check_msg('Initial revision') - - conv.logs[rev + 1].check_change(path, 'D') - conv.logs[rev + 1].check_msg('Remove this file for the first time.') - - if conv.logs[rev + 1].get_path_op('/%(trunk)s') is not None: - raise svntest.Failure + conv.logs[rev].check('Updated CVS', ( + (path, 'A'), + )) + conv.logs[rev + 1].check('Remove this file for the first time.', ( + (path, 'D'), + )) + conv.logs[rev + 2].check('Remove this file for the second time,', ( + )) def split_branch(): @@ -1126,7 +1228,7 @@ def split_branch(): # See test-data/split-branch-cvsrepos/README. # # The conversion will fail if the bug is present, and - # ensure_conversion will raise svntest.Failure. + # ensure_conversion will raise Failure. conv = ensure_conversion('split-branch') @@ -1135,37 +1237,37 @@ def resync_misgroups(): # See test-data/resync-misgroups-cvsrepos/README. # # The conversion will fail if the bug is present, and - # ensure_conversion will raise svntest.Failure. + # ensure_conversion will raise Failure. conv = ensure_conversion('resync-misgroups') -def tagged_branch_and_trunk(**kw): +class TaggedBranchAndTrunk(Cvs2SvnTestCase): "allow tags with mixed trunk and branch sources" - conv = ensure_conversion('tagged-branch-n-trunk', **kw) - tags = kw.get('tags', 'tags') + def __init__(self, **kw): + Cvs2SvnTestCase.__init__(self, 'tagged-branch-n-trunk', **kw) + + def run(self): + conv = self.ensure_conversion() + + tags = conv.symbols.get('tags', 'tags') a_path = conv.get_wc(tags, 'some-tag', 'a.txt') b_path = conv.get_wc(tags, 'some-tag', 'b.txt') if not (os.path.exists(a_path) and os.path.exists(b_path)): - raise svntest.Failure + raise Failure() if (open(a_path, 'r').read().find('1.24') == -1) \ or (open(b_path, 'r').read().find('1.5') == -1): - raise svntest.Failure - - -def tagged_branch_and_trunk_variants(): - "mixed tags, with alternate repo layout" - tagged_branch_and_trunk(trunk='a/1', branches='a/2', tags='a/3') + raise Failure() def enroot_race(): "never use the rev-in-progress as a copy source" # See issue #1427 and r8544. conv = ensure_conversion('enroot-race') - rev = 8 + rev = 7 conv.logs[rev].check_changes(( - ('/%(branches)s/mybranch (from /%(trunk)s:7)', 'A'), + ('/%(branches)s/mybranch (from /%(trunk)s:6)', 'A'), ('/%(branches)s/mybranch/proj/a.txt', 'D'), ('/%(branches)s/mybranch/proj/b.txt', 'D'), )) @@ -1181,31 +1283,31 @@ def enroot_race_obo(): conv = ensure_conversion('enroot-race-obo') conv.logs[3].check_change('/%(branches)s/BRANCH (from /%(trunk)s:2)', 'A') if not len(conv.logs) == 3: - raise svntest.Failure + raise Failure() -def branch_delete_first(**kw): +class BranchDeleteFirst(Cvs2SvnTestCase): "correctly handle deletion as initial branch action" + + def __init__(self, **kw): + Cvs2SvnTestCase.__init__(self, 'branch-delete-first', **kw) + + def run(self): # See test-data/branch-delete-first-cvsrepos/README. # # The conversion will fail if the bug is present, and - # ensure_conversion would raise svntest.Failure. - conv = ensure_conversion('branch-delete-first', **kw) + # ensure_conversion would raise Failure. + conv = self.ensure_conversion() - branches = kw.get('branches', 'branches') + branches = conv.symbols.get('branches', 'branches') # 'file' was deleted from branch-1 and branch-2, but not branch-3 if conv.path_exists(branches, 'branch-1', 'file'): - raise svntest.Failure + raise Failure() if conv.path_exists(branches, 'branch-2', 'file'): - raise svntest.Failure + raise Failure() if not conv.path_exists(branches, 'branch-3', 'file'): - raise svntest.Failure - - -def branch_delete_first_variants(): - "initial delete, with alternate repo layout" - branch_delete_first(trunk='a/1', branches='a/2', tags='a/3') + raise Failure() def nonascii_filenames(): @@ -1247,7 +1349,7 @@ def nonascii_filenames(): # So we're going to skip this test on Mac OS X for now. if sys.platform == "darwin": - raise svntest.Skip + raise svntest.Skip() try: # change locale to non-UTF-8 locale to generate latin1 names @@ -1255,7 +1357,7 @@ def nonascii_filenames(): new_locale) locale_changed = 1 except locale.Error: - raise svntest.Skip + raise svntest.Skip() try: srcrepos_path = os.path.join(test_data_dir,'main-cvsrepos') @@ -1277,6 +1379,32 @@ def nonascii_filenames(): svntest.main.safe_rmtree(dstrepos_path) +class UnicodeLog(Cvs2SvnTestCase): + "log message contains unicode" + + warning_pattern = r'WARNING\: problem encoding author or log message' + + def __init__(self, warning_expected, **kw): + Cvs2SvnTestCase.__init__(self, 'unicode-log', **kw) + self.warning_expected = warning_expected + + def run(self): + try: + # ensure the availability of the "utf_8" encoding: + u'a'.encode('utf_8').decode('utf_8') + except LookupError: + raise svntest.Skip() + + conv = self.ensure_conversion() + + if self.warning_expected: + if not conv.output_found(self.warning_pattern): + raise Failure() + else: + if conv.output_found(self.warning_pattern): + raise Failure() + + def vendor_branch_sameness(): "avoid spurious changes for initial revs" conv = ensure_conversion('vendor-branch-sameness') @@ -1299,15 +1427,16 @@ def vendor_branch_sameness(): # # (Log messages for the same revisions are the same in all files.) # - # What we expect to see is everyone added in r1, then trunk/proj - # copied in r2. In the copy, only a.txt should be left untouched; - # b.txt should be 'M'odified, and (for different reasons) c.txt and - # d.txt should be 'D'eleted. + # We expect that only a.txt is recognized as an import whose 1.1 + # revision can be omitted. The other files should be added on trunk + # then filled to vbranchA, whereas a.txt should be added to vbranchA + # then copied to trunk. In the copy of 1.1.1.1 back to trunk, only + # a.txt should be copied untouched; b.txt should be 'M'odified, and + # c.txt should be 'D'eleted. rev = 2 conv.logs[rev].check('Initial revision', ( ('/%(trunk)s/proj', 'A'), - ('/%(trunk)s/proj/a.txt', 'A'), ('/%(trunk)s/proj/b.txt', 'A'), ('/%(trunk)s/proj/c.txt', 'A'), ('/%(trunk)s/proj/d.txt', 'A'), @@ -1319,10 +1448,18 @@ def vendor_branch_sameness(): )) conv.logs[rev + 2].check('First vendor branch revision.', ( + ('/%(branches)s/vbranchA/proj/a.txt', 'A'), ('/%(branches)s/vbranchA/proj/b.txt', 'M'), ('/%(branches)s/vbranchA/proj/c.txt', 'D'), )) + conv.logs[rev + 3].check('This commit was generated by cvs2svn ' + 'to compensate for changes in r4,', ( + ('/%(trunk)s/proj/a.txt (from /branches/vbranchA/proj/a.txt:4)', 'A'), + ('/%(trunk)s/proj/b.txt (from /branches/vbranchA/proj/b.txt:4)', 'R'), + ('/%(trunk)s/proj/c.txt', 'D'), + )) + def default_branches(): "handle default branches correctly" @@ -1358,146 +1495,208 @@ def default_branches(): # never had a default branch. # - conv.logs[18].check(sym_log_msg('vtag-4',1), ( - ('/%(tags)s/vtag-4 (from /%(branches)s/vbranchA:16)', 'A'), - ('/%(tags)s/vtag-4/proj/d.txt ' - '(from /%(branches)s/unlabeled-1.1.1/proj/d.txt:16)', 'A'), + conv.logs[2].check("Import (vbranchA, vtag-1).", ( + ('/%(branches)s/unlabeled-1.1.1', 'A'), + ('/%(branches)s/unlabeled-1.1.1/proj', 'A'), + ('/%(branches)s/unlabeled-1.1.1/proj/d.txt', 'A'), + ('/%(branches)s/unlabeled-1.1.1/proj/e.txt', 'A'), + ('/%(branches)s/vbranchA', 'A'), + ('/%(branches)s/vbranchA/proj', 'A'), + ('/%(branches)s/vbranchA/proj/a.txt', 'A'), + ('/%(branches)s/vbranchA/proj/b.txt', 'A'), + ('/%(branches)s/vbranchA/proj/c.txt', 'A'), + ('/%(branches)s/vbranchA/proj/deleted-on-vendor-branch.txt', 'A'), )) - conv.logs[6].check(sym_log_msg('vtag-1',1), ( - ('/%(tags)s/vtag-1 (from /%(branches)s/vbranchA:5)', 'A'), - ('/%(tags)s/vtag-1/proj/d.txt ' - '(from /%(branches)s/unlabeled-1.1.1/proj/d.txt:5)', 'A'), + conv.logs[3].check("This commit was generated by cvs2svn " + "to compensate for changes in r2,", ( + ('/%(trunk)s/proj', 'A'), + ('/%(trunk)s/proj/a.txt (from /%(branches)s/vbranchA/proj/a.txt:2)', 'A'), + ('/%(trunk)s/proj/b.txt (from /%(branches)s/vbranchA/proj/b.txt:2)', 'A'), + ('/%(trunk)s/proj/c.txt (from /%(branches)s/vbranchA/proj/c.txt:2)', 'A'), + ('/%(trunk)s/proj/d.txt ' + '(from /%(branches)s/unlabeled-1.1.1/proj/d.txt:2)', 'A'), + ('/%(trunk)s/proj/deleted-on-vendor-branch.txt ' + '(from /%(branches)s/vbranchA/proj/deleted-on-vendor-branch.txt:2)', 'A'), + ('/%(trunk)s/proj/e.txt ' + '(from /%(branches)s/unlabeled-1.1.1/proj/e.txt:2)', 'A'), )) - conv.logs[9].check(sym_log_msg('vtag-2',1), ( - ('/%(tags)s/vtag-2 (from /%(branches)s/vbranchA:7)', 'A'), - ('/%(tags)s/vtag-2/proj/d.txt ' - '(from /%(branches)s/unlabeled-1.1.1/proj/d.txt:7)', 'A'), + conv.logs[4].check(sym_log_msg('vtag-1',1), ( + ('/%(tags)s/vtag-1 (from /%(branches)s/vbranchA:2)', 'A'), + ('/%(tags)s/vtag-1/proj/d.txt ' + '(from /%(branches)s/unlabeled-1.1.1/proj/d.txt:2)', 'A'), )) - conv.logs[12].check(sym_log_msg('vtag-3',1), ( - ('/%(tags)s/vtag-3 (from /%(branches)s/vbranchA:10)', 'A'), - ('/%(tags)s/vtag-3/proj/d.txt ' - '(from /%(branches)s/unlabeled-1.1.1/proj/d.txt:10)', 'A'), - ('/%(tags)s/vtag-3/proj/e.txt ' - '(from /%(branches)s/unlabeled-1.1.1/proj/e.txt:10)', 'A'), + conv.logs[5].check("Import (vbranchA, vtag-2).", ( + ('/%(branches)s/unlabeled-1.1.1/proj/d.txt', 'M'), + ('/%(branches)s/unlabeled-1.1.1/proj/e.txt', 'M'), + ('/%(branches)s/vbranchA/proj/a.txt', 'M'), + ('/%(branches)s/vbranchA/proj/b.txt', 'M'), + ('/%(branches)s/vbranchA/proj/c.txt', 'M'), + ('/%(branches)s/vbranchA/proj/deleted-on-vendor-branch.txt', 'M'), )) - conv.logs[17].check("This commit was generated by cvs2svn " - "to compensate for changes in r16,", ( + conv.logs[6].check("This commit was generated by cvs2svn " + "to compensate for changes in r5,", ( + ('/%(trunk)s/proj/a.txt ' + '(from /%(branches)s/vbranchA/proj/a.txt:5)', 'R'), ('/%(trunk)s/proj/b.txt ' - '(from /%(branches)s/vbranchA/proj/b.txt:16)', 'R'), + '(from /%(branches)s/vbranchA/proj/b.txt:5)', 'R'), ('/%(trunk)s/proj/c.txt ' - '(from /%(branches)s/vbranchA/proj/c.txt:16)', 'R'), + '(from /%(branches)s/vbranchA/proj/c.txt:5)', 'R'), ('/%(trunk)s/proj/d.txt ' - '(from /%(branches)s/unlabeled-1.1.1/proj/d.txt:16)', 'R'), + '(from /%(branches)s/unlabeled-1.1.1/proj/d.txt:5)', 'R'), ('/%(trunk)s/proj/deleted-on-vendor-branch.txt ' - '(from /%(branches)s/vbranchA/proj/deleted-on-vendor-branch.txt:16)', - 'A'), + '(from /%(branches)s/vbranchA/proj/deleted-on-vendor-branch.txt:5)', + 'R'), ('/%(trunk)s/proj/e.txt ' - '(from /%(branches)s/unlabeled-1.1.1/proj/e.txt:16)', 'R'), + '(from /%(branches)s/unlabeled-1.1.1/proj/e.txt:5)', 'R'), + )) + + conv.logs[7].check(sym_log_msg('vtag-2',1), ( + ('/%(tags)s/vtag-2 (from /%(branches)s/vbranchA:5)', 'A'), + ('/%(tags)s/vtag-2/proj/d.txt ' + '(from /%(branches)s/unlabeled-1.1.1/proj/d.txt:5)', 'A'), )) - conv.logs[16].check("Import (vbranchA, vtag-4).", ( + conv.logs[8].check("Import (vbranchA, vtag-3).", ( ('/%(branches)s/unlabeled-1.1.1/proj/d.txt', 'M'), ('/%(branches)s/unlabeled-1.1.1/proj/e.txt', 'M'), ('/%(branches)s/vbranchA/proj/a.txt', 'M'), - ('/%(branches)s/vbranchA/proj/added-then-imported.txt', 'M'), # CHECK!!! ('/%(branches)s/vbranchA/proj/b.txt', 'M'), ('/%(branches)s/vbranchA/proj/c.txt', 'M'), - ('/%(branches)s/vbranchA/proj/deleted-on-vendor-branch.txt', 'A'), + ('/%(branches)s/vbranchA/proj/deleted-on-vendor-branch.txt', 'D'), )) - conv.logs[15].check(sym_log_msg('vbranchA'), ( - ('/%(branches)s/vbranchA/proj/added-then-imported.txt ' - '(from /%(trunk)s/proj/added-then-imported.txt:14)', 'A'), + conv.logs[9].check("This commit was generated by cvs2svn " + "to compensate for changes in r8,", ( + ('/%(trunk)s/proj/a.txt ' + '(from /%(branches)s/vbranchA/proj/a.txt:8)', 'R'), + ('/%(trunk)s/proj/b.txt ' + '(from /%(branches)s/vbranchA/proj/b.txt:8)', 'R'), + ('/%(trunk)s/proj/c.txt ' + '(from /%(branches)s/vbranchA/proj/c.txt:8)', 'R'), + ('/%(trunk)s/proj/d.txt ' + '(from /%(branches)s/unlabeled-1.1.1/proj/d.txt:8)', 'R'), + ('/%(trunk)s/proj/deleted-on-vendor-branch.txt', 'D'), + ('/%(trunk)s/proj/e.txt ' + '(from /%(branches)s/unlabeled-1.1.1/proj/e.txt:8)', 'R'), )) - conv.logs[14].check("Add a file to the working copy.", ( - ('/%(trunk)s/proj/added-then-imported.txt', 'A'), + conv.logs[10].check(sym_log_msg('vtag-3',1), ( + ('/%(tags)s/vtag-3 (from /%(branches)s/vbranchA:8)', 'A'), + ('/%(tags)s/vtag-3/proj/d.txt ' + '(from /%(branches)s/unlabeled-1.1.1/proj/d.txt:8)', 'A'), + ('/%(tags)s/vtag-3/proj/e.txt ' + '(from /%(branches)s/unlabeled-1.1.1/proj/e.txt:8)', 'A'), )) - conv.logs[13].check("First regular commit, to a.txt, on vtag-3.", ( + conv.logs[11].check("First regular commit, to a.txt, on vtag-3.", ( ('/%(trunk)s/proj/a.txt', 'M'), )) - conv.logs[11].check("This commit was generated by cvs2svn " - "to compensate for changes in r10,", ( - ('/%(trunk)s/proj/a.txt ' - '(from /%(branches)s/vbranchA/proj/a.txt:10)', 'R'), - ('/%(trunk)s/proj/b.txt ' - '(from /%(branches)s/vbranchA/proj/b.txt:10)', 'R'), - ('/%(trunk)s/proj/c.txt ' - '(from /%(branches)s/vbranchA/proj/c.txt:10)', 'R'), - ('/%(trunk)s/proj/d.txt ' - '(from /%(branches)s/unlabeled-1.1.1/proj/d.txt:10)', 'R'), - ('/%(trunk)s/proj/deleted-on-vendor-branch.txt', 'D'), - ('/%(trunk)s/proj/e.txt ' - '(from /%(branches)s/unlabeled-1.1.1/proj/e.txt:10)', 'R'), + conv.logs[12].check("Add a file to the working copy.", ( + ('/%(trunk)s/proj/added-then-imported.txt', 'A'), + )) + + conv.logs[13].check(sym_log_msg('vbranchA'), ( + ('/%(branches)s/vbranchA/proj/added-then-imported.txt ' + '(from /%(trunk)s/proj/added-then-imported.txt:12)', 'A'), )) - conv.logs[10].check("Import (vbranchA, vtag-3).", ( + conv.logs[14].check("Import (vbranchA, vtag-4).", ( ('/%(branches)s/unlabeled-1.1.1/proj/d.txt', 'M'), ('/%(branches)s/unlabeled-1.1.1/proj/e.txt', 'M'), ('/%(branches)s/vbranchA/proj/a.txt', 'M'), + ('/%(branches)s/vbranchA/proj/added-then-imported.txt', 'M'), # CHECK!!! ('/%(branches)s/vbranchA/proj/b.txt', 'M'), ('/%(branches)s/vbranchA/proj/c.txt', 'M'), - ('/%(branches)s/vbranchA/proj/deleted-on-vendor-branch.txt', 'D'), + ('/%(branches)s/vbranchA/proj/deleted-on-vendor-branch.txt', 'A'), )) - conv.logs[8].check("This commit was generated by cvs2svn " - "to compensate for changes in r7,", ( - ('/%(trunk)s/proj/a.txt ' - '(from /%(branches)s/vbranchA/proj/a.txt:7)', 'R'), + conv.logs[15].check("This commit was generated by cvs2svn " + "to compensate for changes in r14,", ( ('/%(trunk)s/proj/b.txt ' - '(from /%(branches)s/vbranchA/proj/b.txt:7)', 'R'), + '(from /%(branches)s/vbranchA/proj/b.txt:14)', 'R'), ('/%(trunk)s/proj/c.txt ' - '(from /%(branches)s/vbranchA/proj/c.txt:7)', 'R'), + '(from /%(branches)s/vbranchA/proj/c.txt:14)', 'R'), ('/%(trunk)s/proj/d.txt ' - '(from /%(branches)s/unlabeled-1.1.1/proj/d.txt:7)', 'R'), + '(from /%(branches)s/unlabeled-1.1.1/proj/d.txt:14)', 'R'), ('/%(trunk)s/proj/deleted-on-vendor-branch.txt ' - '(from /%(branches)s/vbranchA/proj/deleted-on-vendor-branch.txt:7)', - 'R'), + '(from /%(branches)s/vbranchA/proj/deleted-on-vendor-branch.txt:14)', + 'A'), ('/%(trunk)s/proj/e.txt ' - '(from /%(branches)s/unlabeled-1.1.1/proj/e.txt:7)', 'R'), + '(from /%(branches)s/unlabeled-1.1.1/proj/e.txt:14)', 'R'), )) - conv.logs[7].check("Import (vbranchA, vtag-2).", ( - ('/%(branches)s/unlabeled-1.1.1/proj/d.txt', 'M'), - ('/%(branches)s/unlabeled-1.1.1/proj/e.txt', 'M'), - ('/%(branches)s/vbranchA/proj/a.txt', 'M'), - ('/%(branches)s/vbranchA/proj/b.txt', 'M'), - ('/%(branches)s/vbranchA/proj/c.txt', 'M'), - ('/%(branches)s/vbranchA/proj/deleted-on-vendor-branch.txt', 'M'), + conv.logs[16].check(sym_log_msg('vtag-4',1), ( + ('/%(tags)s/vtag-4 (from /%(branches)s/vbranchA:14)', 'A'), + ('/%(tags)s/vtag-4/proj/d.txt ' + '(from /%(branches)s/unlabeled-1.1.1/proj/d.txt:14)', 'A'), )) - conv.logs[5].check("Import (vbranchA, vtag-1).", ()) - conv.logs[4].check(sym_log_msg('vbranchA'), ( - ('/%(branches)s/vbranchA (from /%(trunk)s:2)', 'A'), - ('/%(branches)s/vbranchA/proj/d.txt', 'D'), - ('/%(branches)s/vbranchA/proj/e.txt', 'D'), - )) +def default_branches_trunk_only(): + "handle default branches with --trunk-only" - conv.logs[3].check(sym_log_msg('unlabeled-1.1.1'), ( - ('/%(branches)s/unlabeled-1.1.1 (from /%(trunk)s:2)', 'A'), - ('/%(branches)s/unlabeled-1.1.1/proj/a.txt', 'D'), - ('/%(branches)s/unlabeled-1.1.1/proj/b.txt', 'D'), - ('/%(branches)s/unlabeled-1.1.1/proj/c.txt', 'D'), - ('/%(branches)s/unlabeled-1.1.1/proj/deleted-on-vendor-branch.txt', 'D'), - )) + conv = ensure_conversion('default-branches', args=['--trunk-only']) - conv.logs[2].check("Initial revision", ( + conv.logs[2].check("Import (vbranchA, vtag-1).", ( ('/%(trunk)s/proj', 'A'), ('/%(trunk)s/proj/a.txt', 'A'), ('/%(trunk)s/proj/b.txt', 'A'), ('/%(trunk)s/proj/c.txt', 'A'), ('/%(trunk)s/proj/d.txt', 'A'), - ('/%(trunk)s/proj/deleted-on-vendor-branch.txt', 'A'), ('/%(trunk)s/proj/e.txt', 'A'), + ('/%(trunk)s/proj/deleted-on-vendor-branch.txt', 'A'), )) + conv.logs[3].check("Import (vbranchA, vtag-2).", ( + ('/%(trunk)s/proj/a.txt', 'M'), + ('/%(trunk)s/proj/b.txt', 'M'), + ('/%(trunk)s/proj/c.txt', 'M'), + ('/%(trunk)s/proj/d.txt', 'M'), + ('/%(trunk)s/proj/e.txt', 'M'), + ('/%(trunk)s/proj/deleted-on-vendor-branch.txt', 'M'), + )) + + conv.logs[4].check("Import (vbranchA, vtag-3).", ( + ('/%(trunk)s/proj/a.txt', 'M'), + ('/%(trunk)s/proj/b.txt', 'M'), + ('/%(trunk)s/proj/c.txt', 'M'), + ('/%(trunk)s/proj/d.txt', 'M'), + ('/%(trunk)s/proj/e.txt', 'M'), + ('/%(trunk)s/proj/deleted-on-vendor-branch.txt', 'D'), + )) + + conv.logs[5].check("First regular commit, to a.txt, on vtag-3.", ( + ('/%(trunk)s/proj/a.txt', 'M'), + )) + + conv.logs[6].check("Add a file to the working copy.", ( + ('/%(trunk)s/proj/added-then-imported.txt', 'A'), + )) + + conv.logs[7].check("Import (vbranchA, vtag-4).", ( + ('/%(trunk)s/proj/b.txt', 'M'), + ('/%(trunk)s/proj/c.txt', 'M'), + ('/%(trunk)s/proj/d.txt', 'M'), + ('/%(trunk)s/proj/e.txt', 'M'), + ('/%(trunk)s/proj/deleted-on-vendor-branch.txt', 'A'), + )) + + +def default_branch_and_1_2(): + "do not allow 1.2 revision with default branch" + + conv = ensure_conversion( + 'default-branch-and-1-2', + error_re=( + r'.*File has default branch=1\.1\.1 but also a revision 1\.2' + ), + ) + def compose_tag_three_sources(): "compose a tag from three sources" @@ -1555,39 +1754,42 @@ def compose_tag_three_sources(): def pass5_when_to_fill(): "reserve a svn revnum for a fill only when required" # The conversion will fail if the bug is present, and - # ensure_conversion would raise svntest.Failure. + # ensure_conversion would raise Failure. conv = ensure_conversion('pass5-when-to-fill') -def empty_trunk(**kw): +class EmptyTrunk(Cvs2SvnTestCase): "don't break when the trunk is empty" - # The conversion will fail if the bug is present, and - # ensure_conversion would raise svntest.Failure. - conv = ensure_conversion('empty-trunk', **kw) + def __init__(self, **kw): + Cvs2SvnTestCase.__init__(self, 'empty-trunk', **kw) -def empty_trunk_variants(): - "empty trunk, with alternate repo layout" - empty_trunk(trunk='a', branches='b', tags='c') - empty_trunk(trunk='a/1', branches='a/2', tags='a/3') + def run(self): + # The conversion will fail if the bug is present, and + # ensure_conversion would raise Failure. + conv = self.ensure_conversion() def no_spurious_svn_commits(): "ensure that we don't create any spurious commits" conv = ensure_conversion('phoenix') - # Check spurious commit that could be created in CVSCommit._pre_commit + # Check spurious commit that could be created in + # SVNCommitCreator._pre_commit() + # # (When you add a file on a branch, CVS creates a trunk revision # in state 'dead'. If the log message of that commit is equal to # the one that CVS generates, we do not ever create a 'fill' # SVNCommit for it.) # - # and spurious commit that could be created in CVSCommit._commit + # and spurious commit that could be created in + # SVNCommitCreator._commit() + # # (When you add a file on a branch, CVS creates a trunk revision # in state 'dead'. If the log message of that commit is equal to # the one that CVS generates, we do not create a primary SVNCommit # for it.) - conv.logs[18].check('File added on branch xiphophorus', ( + conv.logs[17].check('File added on branch xiphophorus', ( ('/%(branches)s/xiphophorus/added-on-branch.txt', 'A'), )) @@ -1596,37 +1798,35 @@ def no_spurious_svn_commits(): # in state 'dead'. If the log message of that commit is NOT equal # to the one that CVS generates, we create a primary SVNCommit to # serve as a home for the log message in question. - conv.logs[19].check('file added-on-branch2.txt was initially added on ' + conv.logs[18].check('file added-on-branch2.txt was initially added on ' + 'branch xiphophorus,\nand this log message was tweaked', ()) # Check spurious commit that could be created in - # CVSRevisionAggregator.attempt_to_commit_symbols - # (We shouldn't consider a CVSRevision whose op is OP_DEAD as a - # candidate for the LastSymbolicNameDatabase. - conv.logs[20].check('This file was also added on branch xiphophorus,', ( + # SVNCommitCreator._commit_symbols(). + conv.logs[19].check('This file was also added on branch xiphophorus,', ( ('/%(branches)s/xiphophorus/added-on-branch2.txt', 'A'), )) -def peer_path_pruning(**kw): +class PeerPathPruning(Cvs2SvnTestCase): "make sure that filling prunes paths correctly" - conv = ensure_conversion('peer-path-pruning', **kw) - conv.logs[8].check(sym_log_msg('BRANCH'), ( - ('/%(branches)s/BRANCH (from /%(trunk)s:6)', 'A'), - ('/%(branches)s/BRANCH/bar', 'D'), - ('/%(branches)s/BRANCH/foo (from /%(trunk)s/foo:7)', 'R'), - )) + def __init__(self, **kw): + Cvs2SvnTestCase.__init__(self, 'peer-path-pruning', **kw) -def peer_path_pruning_variants(): - "filling prune paths, with alternate repo layout" - peer_path_pruning(trunk='a/1', branches='a/2', tags='a/3') + def run(self): + conv = self.ensure_conversion() + conv.logs[7].check(sym_log_msg('BRANCH'), ( + ('/%(branches)s/BRANCH (from /%(trunk)s:5)', 'A'), + ('/%(branches)s/BRANCH/bar', 'D'), + ('/%(branches)s/BRANCH/foo (from /%(trunk)s/foo:6)', 'R'), + )) def invalid_closings_on_trunk(): "verify correct revs are copied to default branches" # The conversion will fail if the bug is present, and - # ensure_conversion would raise svntest.Failure. + # ensure_conversion would raise Failure. conv = ensure_conversion('invalid-closings-on-trunk') @@ -1636,7 +1836,7 @@ def individual_passes(): conv2 = ensure_conversion('main', passbypass=1) if conv.logs != conv2.logs: - raise svntest.Failure + raise Failure() def resync_bug(): @@ -1657,18 +1857,34 @@ def branch_from_default_branch(): # incorrectly regarding the branch off of the default branch as a # non-trunk default branch. Crystal clear? I thought so. See # issue #42 for more incoherent blathering. - conv.logs[6].check("This commit was generated by cvs2svn", ( + conv.logs[5].check("This commit was generated by cvs2svn", ( ('/%(trunk)s/proj/file.txt ' - '(from /%(branches)s/upstream/proj/file.txt:5)', 'R'), + '(from /%(branches)s/upstream/proj/file.txt:4)', 'R'), )) + def file_in_attic_too(): "die if a file exists in and out of the attic" - try: - ensure_conversion('file-in-attic-too') - raise MissingErrorException - except svntest.Failure: - pass + ensure_conversion( + 'file-in-attic-too', + error_re=( + r'.*A CVS repository cannot contain both ' + r'(.*)' + re.escape(os.sep) + r'(.*) ' + + r'and ' + r'\1' + re.escape(os.sep) + r'Attic' + re.escape(os.sep) + r'\2' + ) + ) + + +def retain_file_in_attic_too(): + "test --retain-conflicting-attic-files option" + conv = ensure_conversion( + 'file-in-attic-too', args=['--retain-conflicting-attic-files']) + if not conv.path_exists('trunk', 'file.txt'): + raise Failure() + if not conv.path_exists('trunk', 'Attic', 'file.txt'): + raise Failure() + def symbolic_name_filling_guide(): "reveal a big bug in our SymbolFillingGuide" @@ -1685,7 +1901,7 @@ class NodeTreeWalkException: def node_for_path(node, path): "In the tree rooted under SVNTree NODE, return the node at PATH." if node.name != '__SVN_ROOT_NODE': - raise NodeTreeWalkException + raise NodeTreeWalkException() path = path.strip('/') components = path.split('/') for component in components: @@ -1698,111 +1914,142 @@ def props_for_path(node, path): return node_for_path(node, path).props -def eol_mime(): - "test eol settings and mime types together" - ###TODO: It's a bit klugey to construct this path here. But so far - ### there's only one test with a mime.types file. If we have more, - ### we should abstract this into some helper, which would be located - ### near ensure_conversion(). Note that it is a convention of this - ### test suite for a mime.types file to be located in the top level - ### of the CVS repository to which it applies. - mime_path = os.path.join(test_data_dir, 'eol-mime-cvsrepos', 'mime.types') +class EOLMime(Cvs2SvnPropertiesTestCase): + """eol settings and mime types together + + The files are as follows: + + trunk/foo.txt: no -kb, mime file says nothing. + trunk/foo.xml: no -kb, mime file says text. + trunk/foo.zip: no -kb, mime file says non-text. + trunk/foo.bin: has -kb, mime file says nothing. + trunk/foo.csv: has -kb, mime file says text. + trunk/foo.dbf: has -kb, mime file says non-text. + """ + + def __init__(self, args, **kw): + # TODO: It's a bit klugey to construct this path here. But so far + # there's only one test with a mime.types file. If we have more, + # we should abstract this into some helper, which would be located + # near ensure_conversion(). Note that it is a convention of this + # test suite for a mime.types file to be located in the top level + # of the CVS repository to which it applies. + self.mime_path = os.path.join( + test_data_dir, 'eol-mime-cvsrepos', 'mime.types') + + Cvs2SvnPropertiesTestCase.__init__( + self, 'eol-mime', + props_to_test=['svn:eol-style', 'svn:mime-type', 'svn:keywords'], + args=['--mime-types=%s' % self.mime_path] + args, + **kw) + # We do four conversions. Each time, we pass --mime-types=FILE with - # the same FILE, but vary --no-default-eol and --eol-from-mime-type. +# the same FILE, but vary --default-eol and --eol-from-mime-type. # Thus there's one conversion with neither flag, one with just the # former, one with just the latter, and one with both. - # - # In two of the four conversions, we pass --cvs-revnums to make - # certain that there are no bad interactions. - # - # The files are as follows: - # - # trunk/foo.txt: no -kb, mime file says nothing. - # trunk/foo.xml: no -kb, mime file says text. - # trunk/foo.zip: no -kb, mime file says non-text. - # trunk/foo.bin: has -kb, mime file says nothing. - # trunk/foo.csv: has -kb, mime file says text. - # trunk/foo.dbf: has -kb, mime file says non-text. - ## Neither --no-default-eol nor --eol-from-mime-type. ## - conv = ensure_conversion( - 'eol-mime', args=['--mime-types=%s' % mime_path, '--cvs-revnums']) - conv.check_props( - ['svn:eol-style', 'svn:mime-type', 'cvs2svn:cvs-rev'], - [ - ('trunk/foo.txt', ['native', None, '1.2']), - ('trunk/foo.xml', ['native', 'text/xml', '1.2']), - ('trunk/foo.zip', ['native', 'application/zip', '1.2']), - ('trunk/foo.bin', [None, 'application/octet-stream', '1.2']), - ('trunk/foo.csv', [None, 'text/csv', '1.2']), - ('trunk/foo.dbf', [None, 'application/what-is-dbf', '1.2']), - ] - ) - ## Just --no-default-eol, not --eol-from-mime-type. ## - conv = ensure_conversion( - 'eol-mime', args=['--mime-types=%s' % mime_path, '--no-default-eol']) - conv.check_props( - ['svn:eol-style', 'svn:mime-type', 'cvs2svn:cvs-rev'], - [ +# Neither --no-default-eol nor --eol-from-mime-type: +eol_mime1 = EOLMime( + variant=1, + args=[], + expected_props=[ ('trunk/foo.txt', [None, None, None]), ('trunk/foo.xml', [None, 'text/xml', None]), ('trunk/foo.zip', [None, 'application/zip', None]), ('trunk/foo.bin', [None, 'application/octet-stream', None]), ('trunk/foo.csv', [None, 'text/csv', None]), ('trunk/foo.dbf', [None, 'application/what-is-dbf', None]), - ] - ) + ]) + - ## Just --eol-from-mime-type, not --no-default-eol. ## - conv = ensure_conversion('eol-mime', args=[ - '--mime-types=%s' % mime_path, '--eol-from-mime-type', '--cvs-revnums' +# Just --no-default-eol, not --eol-from-mime-type: +eol_mime2 = EOLMime( + variant=2, + args=['--default-eol=native'], + expected_props=[ + ('trunk/foo.txt', ['native', None, KEYWORDS]), + ('trunk/foo.xml', ['native', 'text/xml', KEYWORDS]), + ('trunk/foo.zip', ['native', 'application/zip', KEYWORDS]), + ('trunk/foo.bin', [None, 'application/octet-stream', None]), + ('trunk/foo.csv', [None, 'text/csv', None]), + ('trunk/foo.dbf', [None, 'application/what-is-dbf', None]), ]) - conv.check_props( - ['svn:eol-style', 'svn:mime-type', 'cvs2svn:cvs-rev'], - [ - ('trunk/foo.txt', ['native', None, '1.2']), - ('trunk/foo.xml', ['native', 'text/xml', '1.2']), - ('trunk/foo.zip', [None, 'application/zip', '1.2']), - ('trunk/foo.bin', [None, 'application/octet-stream', '1.2']), - ('trunk/foo.csv', [None, 'text/csv', '1.2']), - ('trunk/foo.dbf', [None, 'application/what-is-dbf', '1.2']), - ] - ) - ## Both --no-default-eol and --eol-from-mime-type. ## - conv = ensure_conversion('eol-mime', args=[ - '--mime-types=%s' % mime_path, '--eol-from-mime-type', - '--no-default-eol']) - conv.check_props( - ['svn:eol-style', 'svn:mime-type', 'cvs2svn:cvs-rev'], - [ + +# Just --eol-from-mime-type, not --no-default-eol: +eol_mime3 = EOLMime( + variant=3, + args=['--eol-from-mime-type'], + expected_props=[ ('trunk/foo.txt', [None, None, None]), - ('trunk/foo.xml', ['native', 'text/xml', None]), + ('trunk/foo.xml', ['native', 'text/xml', KEYWORDS]), ('trunk/foo.zip', [None, 'application/zip', None]), ('trunk/foo.bin', [None, 'application/octet-stream', None]), ('trunk/foo.csv', [None, 'text/csv', None]), ('trunk/foo.dbf', [None, 'application/what-is-dbf', None]), - ] - ) + ]) + + +# Both --no-default-eol and --eol-from-mime-type: +eol_mime4 = EOLMime( + variant=4, + args=['--eol-from-mime-type', '--default-eol=native'], + expected_props=[ + ('trunk/foo.txt', ['native', None, KEYWORDS]), + ('trunk/foo.xml', ['native', 'text/xml', KEYWORDS]), + ('trunk/foo.zip', [None, 'application/zip', None]), + ('trunk/foo.bin', [None, 'application/octet-stream', None]), + ('trunk/foo.csv', [None, 'text/csv', None]), + ('trunk/foo.dbf', [None, 'application/what-is-dbf', None]), + ]) -def keywords(): - "test setting of svn:keywords property among others" - conv = ensure_conversion('keywords') - conv.check_props( - ['svn:keywords', 'svn:eol-style', 'svn:mime-type'], - [ - ('trunk/foo.default', ['Author Date Id Revision', 'native', None]), - ('trunk/foo.kkvl', ['Author Date Id Revision', 'native', None]), - ('trunk/foo.kkv', ['Author Date Id Revision', 'native', None]), +cvs_revnums_off = Cvs2SvnPropertiesTestCase( + 'eol-mime', + description='test non-setting of cvs2svn:cvs-rev property', + args=[], + props_to_test=['cvs2svn:cvs-rev'], + expected_props=[ + ('trunk/foo.txt', [None]), + ('trunk/foo.xml', [None]), + ('trunk/foo.zip', [None]), + ('trunk/foo.bin', [None]), + ('trunk/foo.csv', [None]), + ('trunk/foo.dbf', [None]), + ]) + + +cvs_revnums_on = Cvs2SvnPropertiesTestCase( + 'eol-mime', + description='test setting of cvs2svn:cvs-rev property', + args=['--cvs-revnums'], + props_to_test=['cvs2svn:cvs-rev'], + expected_props=[ + ('trunk/foo.txt', ['1.2']), + ('trunk/foo.xml', ['1.2']), + ('trunk/foo.zip', ['1.2']), + ('trunk/foo.bin', ['1.2']), + ('trunk/foo.csv', ['1.2']), + ('trunk/foo.dbf', ['1.2']), + ]) + + +keywords = Cvs2SvnPropertiesTestCase( + 'keywords', + description='test setting of svn:keywords property among others', + args=['--default-eol=native'], + props_to_test=['svn:keywords', 'svn:eol-style', 'svn:mime-type'], + expected_props=[ + ('trunk/foo.default', [KEYWORDS, 'native', None]), + ('trunk/foo.kkvl', [KEYWORDS, 'native', None]), + ('trunk/foo.kkv', [KEYWORDS, 'native', None]), ('trunk/foo.kb', [None, None, 'application/octet-stream']), ('trunk/foo.kk', [None, 'native', None]), ('trunk/foo.ko', [None, 'native', None]), ('trunk/foo.kv', [None, 'native', None]), - ] - ) + ]) def ignore(): @@ -1814,11 +2061,11 @@ def ignore(): if topdir_props['svn:ignore'] != \ '*.idx\n*.aux\n*.dvi\n*.log\nfoo\nbar\nbaz\nqux\n\n': - raise svntest.Failure + raise Failure() if subdir_props['svn:ignore'] != \ '*.idx\n*.aux\n*.dvi\n*.log\nfoo\nbar\nbaz\nqux\n\n': - raise svntest.Failure + raise Failure() def requires_cvs(): @@ -1830,13 +2077,13 @@ def requires_cvs(): cl_contents = file(conv.get_wc("trunk", "client_lock.idl")).read() if atsign_contents[-1:] == "@": - raise svntest.Failure + raise Failure() if cl_contents.find("gregh\n//\n//Integration for locks") < 0: - raise svntest.Failure + raise Failure() if not (conv.logs[21].author == "William Lyon Phelps III" and conv.logs[20].author == "j random"): - raise svntest.Failure + raise Failure() def questionable_branch_names(): @@ -1870,7 +2117,7 @@ def exclude(): for log in conv.logs.values(): for item in log.changed_paths.keys(): if item.startswith('/branches/') or item.startswith('/tags/'): - raise svntest.Failure + raise Failure() def vendor_branch_delete_add(): @@ -1889,7 +2136,7 @@ def resync_pass2_pull_forward(): def native_eol(): "only LFs for svn:eol-style=native files" - conv = ensure_conversion('native-eol') + conv = ensure_conversion('native-eol', args=['--default-eol=native']) lines = run_program(svntest.main.svnadmin_binary, None, 'dump', '-q', conv.repos) # Verify that all files in the dump have LF EOLs. We're actually @@ -1897,7 +2144,7 @@ def native_eol(): # LF EOLs, so we're safe. for line in lines: if line[-1] != '\n' or line[:-1].find('\r') != -1: - raise svntest.Failure + raise Failure() def double_fill(): @@ -1908,6 +2155,20 @@ def double_fill(): # conversion doesn't fail. +def double_fill2(): + "reveal a second bug that created a branch twice" + conv = ensure_conversion('double-fill2') + conv.logs[6].check_msg(sym_log_msg('BRANCH1')) + conv.logs[7].check_msg(sym_log_msg('BRANCH2')) + try: + # This check should fail: + conv.logs[8].check_msg(sym_log_msg('BRANCH2')) + except Failure: + pass + else: + raise Failure('Symbol filled twice in a row') + + def resync_pass2_push_backward(): "ensure pass2 doesn't push rev too far backward" conv = ensure_conversion('resync-pass2-push-backward') @@ -1947,72 +2208,56 @@ def nested_ttb_directories(): ] for opts in opts_list: - try: ensure_conversion( 'main', error_re=r'.*paths .* and .* are not disjoint\.', **opts ) - raise MissingErrorException - except svntest.Failure: - pass -def auto_props_ignore_case(): - "test auto-props (case-insensitive)" - ### TODO: It's a bit klugey to construct this path here. See also - ### the comment in eol_mime(). - auto_props_path = os.path.join(test_data_dir, 'eol-mime-cvsrepos', - 'auto-props') - - # The files are as follows: - # - # trunk/foo.txt: no -kb, mime auto-prop says nothing. - # trunk/foo.xml: no -kb, mime auto-prop says text and eol-style=CRLF. - # trunk/foo.zip: no -kb, mime auto-prop says non-text. - # trunk/foo.bin: has -kb, mime auto-prop says nothing. - # trunk/foo.csv: has -kb, mime auto-prop says text. - # trunk/foo.dbf: has -kb, mime auto-prop says non-text. - # trunk/foo.UPCASE1: no -kb, no mime type. - # trunk/foo.UPCASE2: no -kb, no mime type. +class AutoProps(Cvs2SvnPropertiesTestCase): + """Test auto-props. - conv = ensure_conversion( - 'eol-mime', - args=['--auto-props=%s' % auto_props_path, '--auto-props-ignore-case']) - conv.check_props( - ['myprop', 'svn:eol-style', 'svn:mime-type'], - [ - ('trunk/foo.txt', ['txt', 'native', None]), - ('trunk/foo.xml', ['xml', 'CRLF', 'text/xml']), - ('trunk/foo.zip', ['zip', 'native', 'application/zip']), - ('trunk/foo.bin', ['bin', None, 'application/octet-stream']), - ('trunk/foo.csv', ['csv', None, 'text/csv']), - ('trunk/foo.dbf', ['dbf', None, 'application/what-is-dbf']), - ('trunk/foo.UPCASE1', ['UPCASE1', 'native', None]), - ('trunk/foo.UPCASE2', ['UPCASE2', 'native', None]), - ] - ) + The files are as follows: + trunk/foo.txt: no -kb, mime auto-prop says nothing. + trunk/foo.xml: no -kb, mime auto-prop says text and eol-style=CRLF. + trunk/foo.zip: no -kb, mime auto-prop says non-text. + trunk/foo.bin: has -kb, mime auto-prop says nothing. + trunk/foo.csv: has -kb, mime auto-prop says text and eol-style=CRLF. + trunk/foo.dbf: has -kb, mime auto-prop says non-text. + trunk/foo.UPCASE1: no -kb, no mime type. + trunk/foo.UPCASE2: no -kb, no mime type. + """ -def auto_props(): - "test auto-props (case-sensitive)" - # See auto_props for comments. - auto_props_path = os.path.join(test_data_dir, 'eol-mime-cvsrepos', - 'auto-props') + def __init__(self, args, **kw): + ### TODO: It's a bit klugey to construct this path here. See also + ### the comment in eol_mime(). + auto_props_path = os.path.join( + test_data_dir, 'eol-mime-cvsrepos', 'auto-props') - conv = ensure_conversion( - 'eol-mime', args=['--auto-props=%s' % auto_props_path]) - conv.check_props( - ['myprop', 'svn:eol-style', 'svn:mime-type'], - [ - ('trunk/foo.txt', ['txt', 'native', None]), - ('trunk/foo.xml', ['xml', 'CRLF', 'text/xml']), - ('trunk/foo.zip', ['zip', 'native', 'application/zip']), - ('trunk/foo.bin', ['bin', None, 'application/octet-stream']), - ('trunk/foo.csv', ['csv', None, 'text/csv']), - ('trunk/foo.dbf', ['dbf', None, 'application/what-is-dbf']), - ('trunk/foo.UPCASE1', ['UPCASE1', 'native', None]), - ('trunk/foo.UPCASE2', [None, 'native', None]), - ] - ) + Cvs2SvnPropertiesTestCase.__init__( + self, 'eol-mime', + props_to_test=[ + 'myprop', 'svn:eol-style', 'svn:mime-type', 'svn:keywords'], + args=[ + '--auto-props=%s' % auto_props_path, + '--eol-from-mime-type' + ] + args, + **kw) + + +auto_props_ignore_case = AutoProps( + description="test auto-props", + args=['--default-eol=native'], + expected_props=[ + ('trunk/foo.txt', ['txt', 'native', None, KEYWORDS]), + ('trunk/foo.xml', ['xml', 'CRLF', 'text/xml', KEYWORDS]), + ('trunk/foo.zip', ['zip', None, 'application/zip', None]), + ('trunk/foo.bin', ['bin', None, 'application/octet-stream', None]), + ('trunk/foo.csv', ['csv', 'CRLF', 'text/csv', None]), + ('trunk/foo.dbf', ['dbf', None, 'application/what-is-dbf', None]), + ('trunk/foo.UPCASE1', ['UPCASE1', 'native', None, KEYWORDS]), + ('trunk/foo.UPCASE2', ['UPCASE2', 'native', None, KEYWORDS]), + ]) def ctrl_char_in_filename(): @@ -2034,17 +2279,13 @@ def ctrl_char_in_filename(): # Operating systems that don't allow control characters in # filenames will hopefully have thrown an exception; in that # case, just skip this test. - raise svntest.Skip + raise svntest.Skip() - try: conv = ensure_conversion( 'ctrl-char-filename', error_re=(r'.*Character .* in filename .* ' - r'is not supported by subversion\.'), + r'is not supported by Subversion\.'), ) - raise MissingErrorException - except svntest.Failure: - pass finally: svntest.main.safe_rmtree(dstrepos_path) @@ -2091,30 +2332,33 @@ def double_branch_delete(): # modification, the file will end up live on the branch instead of # dead! Make sure it happens at the right time. - conv.logs[6].check(sym_log_msg('Branch_4_0'), ( - ('/%(branches)s/Branch_4_0/IMarshalledValue.java ' - '(from /%(trunk)s/IMarshalledValue.java:5)', 'A'), + conv.logs[6].check('JBAS-2436 - Adding LGPL Header2', ( + ('/%(branches)s/Branch_4_0/IMarshalledValue.java', 'A'), )); - conv.logs[7].check('file IMarshalledValue.java was added on branch', ( + conv.logs[7].check('JBAS-3025 - Removing dependency', ( ('/%(branches)s/Branch_4_0/IMarshalledValue.java', 'D'), )); - conv.logs[8].check('JBAS-2436 - Adding LGPL Header2', ( - ('/%(branches)s/Branch_4_0/IMarshalledValue.java', 'A'), - )); - def symbol_mismatches(): "error for conflicting tag/branch" try: ensure_conversion('symbol-mess') - raise MissingErrorException - except svntest.Failure: + raise MissingErrorException() + except Failure: pass +def overlook_symbol_mismatches(): + "overlook conflicting tag/branch when --trunk-only" + + # This is a test for issue #85. + + ensure_conversion('symbol-mess', args=['--trunk-only']) + + def force_symbols(): "force symbols to be tags/branches" @@ -2123,16 +2367,16 @@ def force_symbols(): args=['--force-branch=MOSTLY_BRANCH', '--force-tag=MOSTLY_TAG']) if conv.path_exists('tags', 'BRANCH') \ or not conv.path_exists('branches', 'BRANCH'): - raise svntest.Failure + raise Failure() if not conv.path_exists('tags', 'TAG') \ or conv.path_exists('branches', 'TAG'): - raise svntest.Failure + raise Failure() if conv.path_exists('tags', 'MOSTLY_BRANCH') \ or not conv.path_exists('branches', 'MOSTLY_BRANCH'): - raise svntest.Failure + raise Failure() if not conv.path_exists('tags', 'MOSTLY_TAG') \ or conv.path_exists('branches', 'MOSTLY_TAG'): - raise svntest.Failure + raise Failure() def commit_blocks_tags(): @@ -2143,8 +2387,8 @@ def commit_blocks_tags(): ensure_conversion( 'symbol-mess', args=(basic_args + ['--force-tag=BRANCH_WITH_COMMIT'])) - raise MissingErrorException - except svntest.Failure: + raise MissingErrorException() + except Failure: pass @@ -2157,8 +2401,8 @@ def blocked_excludes(): ensure_conversion( 'symbol-mess', args=(basic_args + ['--exclude=BLOCKED_BY_%s' % blocker])) - raise MissingErrorException - except svntest.Failure: + raise MissingErrorException() + except Failure: pass @@ -2181,10 +2425,10 @@ def regexp_force_symbols(): args=['--force-branch=MOST.*_BRANCH', '--force-tag=MOST.*_TAG']) if conv.path_exists('tags', 'MOSTLY_BRANCH') \ or not conv.path_exists('branches', 'MOSTLY_BRANCH'): - raise svntest.Failure + raise Failure() if not conv.path_exists('tags', 'MOSTLY_TAG') \ or conv.path_exists('branches', 'MOSTLY_TAG'): - raise svntest.Failure + raise Failure() def heuristic_symbol_default(): @@ -2194,10 +2438,10 @@ def heuristic_symbol_default(): 'symbol-mess', args=['--symbol-default=heuristic']) if conv.path_exists('tags', 'MOSTLY_BRANCH') \ or not conv.path_exists('branches', 'MOSTLY_BRANCH'): - raise svntest.Failure + raise Failure() if not conv.path_exists('tags', 'MOSTLY_TAG') \ or conv.path_exists('branches', 'MOSTLY_TAG'): - raise svntest.Failure + raise Failure() def branch_symbol_default(): @@ -2207,10 +2451,10 @@ def branch_symbol_default(): 'symbol-mess', args=['--symbol-default=branch']) if conv.path_exists('tags', 'MOSTLY_BRANCH') \ or not conv.path_exists('branches', 'MOSTLY_BRANCH'): - raise svntest.Failure + raise Failure() if conv.path_exists('tags', 'MOSTLY_TAG') \ or not conv.path_exists('branches', 'MOSTLY_TAG'): - raise svntest.Failure + raise Failure() def tag_symbol_default(): @@ -2220,10 +2464,10 @@ def tag_symbol_default(): 'symbol-mess', args=['--symbol-default=tag']) if not conv.path_exists('tags', 'MOSTLY_BRANCH') \ or conv.path_exists('branches', 'MOSTLY_BRANCH'): - raise svntest.Failure + raise Failure() if not conv.path_exists('tags', 'MOSTLY_TAG') \ or conv.path_exists('branches', 'MOSTLY_TAG'): - raise svntest.Failure + raise Failure() def symbol_transform(): @@ -2233,18 +2477,18 @@ def symbol_transform(): 'symbol-mess', args=[ '--symbol-default=heuristic', - '--symbol-transform=^BRANCH:branch', - '--symbol-transform=^TAG:tag', - '--symbol-transform=^MOSTLY_(BRANCH|TAG):MOSTLY.\\1', + '--symbol-transform=BRANCH:branch', + '--symbol-transform=TAG:tag', + '--symbol-transform=MOSTLY_(BRANCH|TAG):MOSTLY.\\1', ]) if not conv.path_exists('branches', 'branch'): - raise svntest.Failure + raise Failure() if not conv.path_exists('tags', 'tag'): - raise svntest.Failure + raise Failure() if not conv.path_exists('branches', 'MOSTLY.BRANCH'): - raise svntest.Failure + raise Failure() if not conv.path_exists('tags', 'MOSTLY.TAG'): - raise svntest.Failure + raise Failure() def issue_99(): @@ -2259,7 +2503,7 @@ def issue_100(): conv = ensure_conversion('issue-100') file1 = conv.get_wc('trunk', 'file1.txt') if file(file1).read() != 'file1.txt<1.2>\n': - raise svntest.Failure + raise Failure() def issue_106(): @@ -2274,85 +2518,394 @@ def options_option(): conv = ensure_conversion('main', options_file='cvs2svn.options') -#---------------------------------------------------------------------- +def tag_with_no_revision(): + "tag defined but revision is deleted" + + conv = ensure_conversion('tag-with-no-revision') + + +def delete_cvsignore(): + "svn:ignore should vanish when .cvsignore does" + + # This is issue #81. + + conv = ensure_conversion('delete-cvsignore') + + wc_tree = conv.get_wc_tree() + props = props_for_path(wc_tree, 'trunk/proj') + + if props.has_key('svn:ignore'): + raise Failure() + + +def repeated_deltatext(): + "ignore repeated deltatext blocks with warning" + + conv = ensure_conversion( + 'repeated-deltatext', + error_re=(r'.*Deltatext block for revision 1.1 appeared twice'), + ) + + +def nasty_graphs(): + "process some nasty dependency graphs" + + # It's not how well the bear can dance, but that the bear can dance + # at all: + conv = ensure_conversion('nasty-graphs') + + +def tagging_after_delete(): + "optimal tag after deleting files" + + conv = ensure_conversion('tagging-after-delete') + + # tag should be 'clean', no deletes + log = conv.find_tag_log('tag1') + expected = ( + ('/%(tags)s/tag1 (from /%(trunk)s:3)', 'A'), + ) + log.check_changes(expected) + + +def crossed_branches(): + "branches created in inconsistent orders" + + conv = ensure_conversion('crossed-branches') + + +def file_directory_conflict(): + "error when filename conflicts with directory name" + + conv = ensure_conversion( + 'file-directory-conflict', + error_re=r'.*Directory name conflicts with filename', + ) + + +def attic_directory_conflict(): + "error when attic filename conflicts with dirname" + + # This tests the problem reported in issue #105. + + conv = ensure_conversion( + 'attic-directory-conflict', + error_re=r'.*Directory name conflicts with filename', + ) + + +def internal_co(): + "verify that --use-internal-co works" + + rcs_conv = ensure_conversion( + 'main', args=['--use-rcs', '--default-eol=native'], + ) + conv = ensure_conversion( + 'main', args=['--default-eol=native'], + ) + if conv.output_found(r'WARNING\: internal problem\: leftover revisions'): + raise Failure() + rcs_lines = run_program( + svntest.main.svnadmin_binary, None, 'dump', '-q', '-r', '1:HEAD', + rcs_conv.repos) + lines = run_program( + svntest.main.svnadmin_binary, None, 'dump', '-q', '-r', '1:HEAD', + conv.repos) + # Compare all lines following the repository UUID: + if lines[3:] != rcs_lines[3:]: + raise Failure() + + +def internal_co_exclude(): + "verify that --use-internal-co --exclude=... works" + + rcs_conv = ensure_conversion( + 'internal-co', + args=['--use-rcs', '--exclude=BRANCH', '--default-eol=native'], + ) + conv = ensure_conversion( + 'internal-co', + args=['--exclude=BRANCH', '--default-eol=native'], + ) + if conv.output_found(r'WARNING\: internal problem\: leftover revisions'): + raise Failure() + rcs_lines = run_program( + svntest.main.svnadmin_binary, None, 'dump', '-q', '-r', '1:HEAD', + rcs_conv.repos) + lines = run_program( + svntest.main.svnadmin_binary, None, 'dump', '-q', '-r', '1:HEAD', + conv.repos) + # Compare all lines following the repository UUID: + if lines[3:] != rcs_lines[3:]: + raise Failure() + + +def internal_co_trunk_only(): + "verify that --use-internal-co --trunk-only works" + + conv = ensure_conversion( + 'internal-co', + args=['--trunk-only', '--default-eol=native'], + ) + if conv.output_found(r'WARNING\: internal problem\: leftover revisions'): + raise Failure() + + +def leftover_revs(): + "check for leftover checked-out revisions" + + conv = ensure_conversion( + 'leftover-revs', + args=['--exclude=BRANCH', '--default-eol=native'], + ) + if conv.output_found(r'WARNING\: internal problem\: leftover revisions'): + raise Failure() + + +def requires_internal_co(): + "test that internal co can do more than RCS" + # See issues 4, 11 for the bugs whose regression we're testing for. + # Unlike in requires_rcs above, issue 29 is not covered. + conv = ensure_conversion('requires-cvs') + + atsign_contents = file(conv.get_wc("trunk", "atsign-add")).read() + + if atsign_contents[-1:] == "@": + raise Failure() + + if not (conv.logs[21].author == "William Lyon Phelps III" and + conv.logs[20].author == "j random"): + raise Failure() + + +def timestamp_chaos(): + "test timestamp adjustments" + + conv = ensure_conversion('timestamp-chaos', args=["-v"]) + + times = [ + '2007-01-01 22:00:00', # Initial commit + '2007-01-01 22:00:00', # revision 1.1 of both files + '2007-01-01 22:00:01', # revision 1.2 of file1.txt, adjusted forwards + '2007-01-01 22:00:02', # revision 1.2 of file1.txt, adjusted backwards + '2007-01-01 23:00:00', # revision 1.3 of both files + ] + for i in range(len(times)): + if abs(conv.logs[i + 1].date - time.mktime(svn_strptime(times[i]))) > 0.1: + raise Failure() + + +def symlinks(): + "convert a repository that contains symlinks" + + # This is a test for issue #97. + + if not os.path.islink( + os.path.join(test_data_dir, 'symlinks-cvsrepos', 'proj', 'dir2') + ): + # Apparently this OS doesn't support symlinks, so skip test. + raise svntest.Skip() + + conv = ensure_conversion('symlinks') + conv.logs[2].check('', ( + ('/%(trunk)s/proj', 'A'), + ('/%(trunk)s/proj/file.txt', 'A'), + ('/%(trunk)s/proj/dir1', 'A'), + ('/%(trunk)s/proj/dir1/file.txt', 'A'), + ('/%(trunk)s/proj/dir2', 'A'), + ('/%(trunk)s/proj/dir2/file.txt', 'A'), + )) + + +def empty_trunk_path(): + "allow --trunk to be empty if --trunk-only" + + # This is a test for issue #53. + + conv = ensure_conversion( + 'main', args=['--trunk-only', '--trunk='], + ) + + +def preferred_parent_cycle(): + "handle a cycle in branch parent preferences" + + conv = ensure_conversion('preferred-parent-cycle') + + +def branch_from_empty_dir(): + "branch from an empty directory" + + conv = ensure_conversion('branch-from-empty-dir') + + +def trunk_readd(): + "add a file on a branch then on trunk" + + conv = ensure_conversion('trunk-readd') + + +def branch_from_deleted_1_1(): + "branch from a 1.1 revision that will be deleted" + + conv = ensure_conversion('branch-from-deleted-1-1') + conv.logs[5].check('Adding b.txt:1.1.2.1', ( + ('/%(branches)s/BRANCH1/proj/b.txt', 'A'), + )) + conv.logs[6].check('Adding b.txt:1.1.4.1', ( + ('/%(branches)s/BRANCH2/proj/b.txt', 'A'), + )) + conv.logs[7].check('Adding b.txt:1.2', ( + ('/%(trunk)s/proj/b.txt', 'A'), + )) + + conv.logs[8].check('Adding c.txt:1.1.2.1', ( + ('/%(branches)s/BRANCH1/proj/c.txt', 'A'), + )) + conv.logs[9].check('Adding c.txt:1.1.4.1', ( + ('/%(branches)s/BRANCH2/proj/c.txt', 'A'), + )) + + +def add_on_branch(): + "add a file on a branch using newer CVS" + + conv = ensure_conversion('add-on-branch') + conv.logs[6].check('Adding b.txt:1.1', ( + ('/%(trunk)s/proj/b.txt', 'A'), + )) + conv.logs[7].check('Adding b.txt:1.1.2.2', ( + ('/%(branches)s/BRANCH1/proj/b.txt', 'A'), + )) + conv.logs[8].check('Adding c.txt:1.1', ( + ('/%(trunk)s/proj/c.txt', 'A'), + )) + conv.logs[9].check('Removing c.txt:1.2', ( + ('/%(trunk)s/proj/c.txt', 'D'), + )) + conv.logs[10].check('Adding c.txt:1.2.2.2', ( + ('/%(branches)s/BRANCH2/proj/c.txt', 'A'), + )) + conv.logs[11].check('Adding d.txt:1.1', ( + ('/%(trunk)s/proj/d.txt', 'A'), + )) + conv.logs[12].check('Adding d.txt:1.1.2.2', ( + ('/%(branches)s/BRANCH3/proj/d.txt', 'A'), + )) + ######################################################################## # Run the tests # list all tests here, starting with None: -test_list = [ None, - show_usage, # 1 +test_list = [ + None, +# 1: + show_usage, attr_exec, space_fname, two_quick, - prune_with_care, + PruneWithCare(), + PruneWithCare(variant=1, trunk='a', branches='b', tags='c'), + PruneWithCare(variant=2, trunk='a/1', branches='b/1', tags='c/1'), + PruneWithCare(variant=3, trunk='a/1', branches='a/2', tags='a/3'), interleaved_commits, +# 10: simple_commits, - simple_tags, + SimpleTags(), + SimpleTags(variant=1, trunk='a', branches='b', tags='c'), + SimpleTags(variant=2, trunk='a/1', branches='b/1', tags='c/1'), + SimpleTags(variant=3, trunk='a/1', branches='a/2', tags='a/3'), simple_branch_commits, - mixed_time_tag, # 10 + mixed_time_tag, mixed_time_branch_with_added_file, mixed_commit, split_time_branch, +# 20: bogus_tag, overlapping_branch, - phoenix_branch, + PhoenixBranch(), + PhoenixBranch(variant=1, trunk='a/1', branches='b/1', tags='c/1'), ctrl_char_in_log, overdead, - no_trunk_prune, - double_delete, # 20 + NoTrunkPrune(), + NoTrunkPrune(variant=1, trunk='a', branches='b', tags='c'), + NoTrunkPrune(variant=2, trunk='a/1', branches='b/1', tags='c/1'), + NoTrunkPrune(variant=3, trunk='a/1', branches='a/2', tags='a/3'), +# 30: + double_delete, split_branch, resync_misgroups, - tagged_branch_and_trunk, + TaggedBranchAndTrunk(), + TaggedBranchAndTrunk(variant=1, trunk='a/1', branches='a/2', tags='a/3'), enroot_race, enroot_race_obo, - branch_delete_first, + BranchDeleteFirst(), + BranchDeleteFirst(variant=1, trunk='a/1', branches='a/2', tags='a/3'), nonascii_filenames, +# 40: + UnicodeLog( + warning_expected=1), + UnicodeLog( + warning_expected=0, + variant='encoding', args=['--encoding=utf_8']), + UnicodeLog( + warning_expected=0, + variant='fallback-encoding', args=['--fallback-encoding=utf_8']), vendor_branch_sameness, default_branches, - compose_tag_three_sources, # 30 + default_branches_trunk_only, + default_branch_and_1_2, + compose_tag_three_sources, pass5_when_to_fill, - peer_path_pruning, - empty_trunk, + PeerPathPruning(), +# 50: + PeerPathPruning(variant=1, trunk='a/1', branches='a/2', tags='a/3'), + EmptyTrunk(), + EmptyTrunk(variant=1, trunk='a', branches='b', tags='c'), + EmptyTrunk(variant=2, trunk='a/1', branches='a/2', tags='a/3'), no_spurious_svn_commits, invalid_closings_on_trunk, individual_passes, resync_bug, branch_from_default_branch, file_in_attic_too, - symbolic_name_filling_guide, # 40 - eol_mime, +# 60: + retain_file_in_attic_too, + symbolic_name_filling_guide, + eol_mime1, + eol_mime2, + eol_mime3, + eol_mime4, + cvs_revnums_off, + cvs_revnums_on, keywords, ignore, +# 70: requires_cvs, questionable_branch_names, questionable_tag_names, revision_reorder_bug, exclude, vendor_branch_delete_add, - resync_pass2_pull_forward, # 50 + resync_pass2_pull_forward, native_eol, double_fill, + XFail(double_fill2), +# 80: resync_pass2_push_backward, double_add, bogus_branch_copy, nested_ttb_directories, - prune_with_care_variants, - simple_tags_variants, - phoenix_branch_variants, - no_trunk_prune_variants, # 60 - tagged_branch_and_trunk_variants, - branch_delete_first_variants, - empty_trunk_variants, - peer_path_pruning_variants, auto_props_ignore_case, - auto_props, ctrl_char_in_filename, commit_dependencies, show_help_passes, - multiple_tags, # 70 + multiple_tags, double_branch_delete, +# 90: symbol_mismatches, + overlook_symbol_mismatches, force_symbols, commit_blocks_tags, blocked_excludes, @@ -2360,12 +2913,36 @@ test_list = [ None, regexp_force_symbols, heuristic_symbol_default, branch_symbol_default, - tag_symbol_default, # 80 + tag_symbol_default, +# 100: symbol_transform, - XFail(issue_99), - XFail(issue_100), - XFail(issue_106), + issue_99, + issue_100, + issue_106, options_option, + tag_with_no_revision, + XFail(delete_cvsignore), + repeated_deltatext, + nasty_graphs, + XFail(tagging_after_delete), +# 110: + crossed_branches, + file_directory_conflict, + attic_directory_conflict, + internal_co, + internal_co_exclude, + internal_co_trunk_only, + leftover_revs, + requires_internal_co, + timestamp_chaos, + symlinks, +# 120: + empty_trunk_path, + preferred_parent_cycle, + branch_from_empty_dir, + trunk_readd, + branch_from_deleted_1_1, + add_on_branch, ] if __name__ == '__main__': diff -purNbBwx .svn cvs2svn-1.5.x/setup.py cvs2svn-2.0.0/setup.py --- cvs2svn-1.5.x/setup.py 2006-07-23 23:02:38.000000000 +0200 +++ cvs2svn-2.0.0/setup.py 2007-08-15 22:53:54.000000000 +0200 @@ -6,11 +6,13 @@ from distutils.core import setup assert sys.hexversion >= 0x02020000, "Install Python 2.2 or greater" + def get_version(): - "Return the version number listed in cvs2svn." - d = {} - execfile('cvs2svn', d) - return d['VERSION'] + "Return the version number of cvs2svn." + + from cvs2svn_lib.version import VERSION + return VERSION + setup( # Metadata. diff -purNbBwx .svn cvs2svn-1.5.x/show-db.py cvs2svn-2.0.0/show-db.py --- cvs2svn-1.5.x/show-db.py 2006-09-16 23:13:00.000000000 +0200 +++ cvs2svn-2.0.0/show-db.py 1970-01-01 01:00:00.000000000 +0100 @@ -1,207 +0,0 @@ -#!/usr/bin/env python - -import anydbm -import marshal -import sys -import os -import getopt -import cPickle as pickle -from cStringIO import StringIO - - -def usage(): - cmd = sys.argv[0] - sys.stderr.write('Usage: %s OPTION [DIRECTORY]\n\n' % os.path.basename(cmd)) - sys.stderr.write( - 'Show the contents of the temporary database files created by cvs2svn\n' - 'in a structured human-readable way.\n' - '\n' - 'OPTION is one of:\n' - ' -R SVNRepositoryMirror revisions table\n' - ' -N SVNRepositoryMirror nodes table\n' - ' -r rev SVNRepositoryMirror node tree for specific revision\n' - ' -m MetadataDatabase\n' - ' -l LastSymbolicNameDatabase\n' - ' -f CVSFileDatabase\n' - ' -c PersistenceManager SVNCommit table\n' - ' -C PersistenceManager cvs-revs-to-svn-revnums table\n' - ' -i CVSItemDatabase (normal)\n' - ' -I CVSItemDatabase (resync)\n' - ' -p file Show the given file, assuming it contains a pickle.\n' - '\n' - 'DIRECTORY is the directory containing the temporary database files.\n' - 'If omitted, the current directory is assumed.\n') - sys.exit(1) - - -def print_node_tree(db, key="0", name="<rootnode>", prefix=""): - print "%s%s (%s)" % (prefix, name, key) - if name[:1] != "/": - dict = marshal.loads(db[key]) - items = dict.items() - items.sort() - for entry in items: - print_node_tree(db, entry[1], entry[0], prefix + " ") - - -def show_int2str_db(fname): - db = anydbm.open(fname, 'r') - k = map(int, db.keys()) - k.sort() - for i in k: - print "%6d: %s" % (i, db[str(i)]) - -def show_str2marshal_db(fname): - db = anydbm.open(fname, 'r') - k = db.keys() - k.sort() - for i in k: - print "%6s: %s" % (i, marshal.loads(db[i])) - -def show_str2pickle_db(fname): - db = anydbm.open(fname, 'r') - k = db.keys() - k.sort() - for i in k: - o = pickle.loads(db[i]) - print "%6s: %r" % (i, o) - print " %s" % (o,) - -def show_str2ppickle_db(fname): - db = anydbm.open(fname, 'r') - k = db.keys() - k.remove('_') - k.sort(key=lambda s: int(s, 16)) - u1 = pickle.Unpickler(StringIO(db['_'])) - u1.load() - for i in k: - u2 = pickle.Unpickler(StringIO(db[i])) - u2.memo = u1.memo.copy() - o = u2.load() - print "%6s: %r" % (i, o) - print " %s" % (o,) - -def show_cvsitemstore(fname): - f = open(fname, 'rb') - - u1 = pickle.Unpickler(f) - memo = u1.load() - - while True: - u2 = pickle.Unpickler(f) - u2.memo = memo.copy() - try: - items = u2.load() - except EOFError: - break - items.sort(key=lambda i: i.id) - for item in items: - print "%6s: %r" % (item.id, item) - print " %s" % (item,) - - -def show_resynccvsitemstore(fname): - f = open(fname, 'rb') - - u = pickle.Unpickler(f) - (pickler_memo, unpickler_memo,) = u.load() - - while True: - u = pickle.Unpickler(f) - u.memo = unpickler_memo.copy() - try: - item = u.load() - except EOFError: - break - print "%6s: %r" % (item.id, item) - print " %s" % (item,) - - - -class ProjectList: - """A mock project-list that can be assigned to Ctx().projects.""" - - def __init__(self): - self.projects = {} - - def __getitem__(self, i): - return self.projects.setdefault(i, 'Project%d' % i) - - -def prime_ctx(): - from cvs2svn_lib.symbol_database import SymbolDatabase - from cvs2svn_lib.cvs_file_database import CVSFileDatabase - from cvs2svn_lib.database import DB_OPEN_READ - from cvs2svn_lib.artifact_manager import artifact_manager - from cvs2svn_lib.context import Ctx - artifact_manager.register_temp_file("cvs2svn-symbols.pck", None) - artifact_manager.register_temp_file("cvs2svn-cvs-files.db", None) - from cvs2svn_lib.cvs_item_database import OldIndexedCVSItemStore - from cvs2svn_lib.metadata_database import MetadataDatabase - from cvs2svn_lib import config - artifact_manager.register_temp_file("cvs2svn-metadata.db", None) - artifact_manager.pass_started(None) - - Ctx().projects = ProjectList() - Ctx()._symbol_db = SymbolDatabase() - Ctx()._cvs_file_db = CVSFileDatabase(DB_OPEN_READ) - Ctx()._cvs_items_db = OldIndexedCVSItemStore( - config.CVS_ITEMS_RESYNC_STORE, config.CVS_ITEMS_RESYNC_INDEX_TABLE) - Ctx()._metadata_db = MetadataDatabase(DB_OPEN_READ) - -def main(): - try: - opts, args = getopt.getopt(sys.argv[1:], "RNr:mlfcCiIp:") - except getopt.GetoptError: - usage() - - if len(args) > 1 or len(opts) != 1: - usage() - - if len(args) == 1: - os.chdir(args[0]) - - for o, a in opts: - if o == "-R": - show_int2str_db("cvs2svn-svn-revisions.db") - elif o == "-N": - show_str2marshal_db("cvs2svn-svn-nodes.db") - elif o == "-r": - try: - revnum = int(a) - except ValueError: - sys.stderr.write('Option -r requires a valid revision number\n') - sys.exit(1) - db = anydbm.open("cvs2svn-svn-revisions.db", 'r') - key = db[str(revnum)] - db.close() - db = anydbm.open("cvs2svn-svn-nodes.db", 'r') - print_node_tree(db, key, "Revision %d" % revnum) - elif o == "-m": - show_str2marshal_db("cvs2svn-metadata.db") - elif o == "-l": - show_str2marshal_db("cvs2svn-symbol-last-cvs-revs.db") - elif o == "-f": - show_str2pickle_db("cvs2svn-cvs-files.db") - elif o == "-c": - prime_ctx() - show_str2ppickle_db("cvs2svn-svn-commits.db") - elif o == "-C": - show_str2marshal_db("cvs2svn-cvs-revs-to-svn-revnums.db") - elif o == "-i": - prime_ctx() - show_cvsitemstore("cvs2svn-cvs-items.pck") - elif o == "-I": - prime_ctx() - show_resynccvsitemstore("cvs2svn-cvs-items-resync.pck") - elif o == "-p": - obj = pickle.load(open(a)) - print repr(obj) - print obj - else: - usage() - sys.exit(2) - - -if __name__ == '__main__': - main() diff -purNbBwx .svn cvs2svn-1.5.x/svntest/actions.py cvs2svn-2.0.0/svntest/actions.py --- cvs2svn-1.5.x/svntest/actions.py 2007-02-12 18:44:55.000000000 +0100 +++ cvs2svn-2.0.0/svntest/actions.py 2007-08-15 22:54:10.000000000 +0200 @@ -50,10 +50,21 @@ class SVNIncorrectDatatype(SVNUnexpected run_and_verify_* API""" pass +class UnorderedOutput: + """Simple class to mark unorder output""" + + def __init__(self, output): + self.output = output + + +def no_sleep_for_timestamps(): + os.environ['SVN_SLEEP_FOR_TIMESTAMPS'] = 'no' + +def do_sleep_for_timestamps(): + os.environ['SVN_SLEEP_FOR_TIMESTAMPS'] = 'yes' def setup_pristine_repository(): - """Create the pristine repository, 'svn import' the greek tree and - checkout the pristine working copy""" + """Create the pristine repository and 'svn import' the greek tree""" # these directories don't exist out of the box, so we may have to create them if not os.path.exists(main.general_wc_dir): @@ -178,6 +189,10 @@ def run_and_verify_svn(message, expected - If it is a single string, invoke match_or_fail() on MESSAGE, the expected output, and the actual output. + - If it is an instance of UnorderedOutput, invoke + compare_unordered_and_display_lines() on MESSAGE, the expected + output, and the actual output. + If EXPECTED_STDOUT is None, do not check stdout. EXPECTED_STDERR may not be None. @@ -200,6 +215,9 @@ def run_and_verify_svn(message, expected compare_and_display_lines(message, output_type.upper(), expected, actual) elif type(expected) is type(''): match_or_fail(message, output_type.upper(), expected, actual) + elif isinstance(expected, UnorderedOutput): + compare_unordered_and_display_lines(message, output_type.upper(), + expected.output, actual) elif expected == SVNAnyOutput: if len(actual) == 0: if message is not None: @@ -233,6 +251,24 @@ def run_and_verify_dump(repo_dir): return output + +def load_repo(sbox, dumpfile_path = None, dump_str = None): + "Loads the dumpfile into sbox" + if not dump_str: + dump_str = main.file_read(dumpfile_path, "rb") + + # Create a virgin repos and working copy + main.safe_rmtree(sbox.repo_dir, 1) + main.safe_rmtree(sbox.wc_dir, 1) + main.create_repos(sbox.repo_dir) + + # Load the mergetracking dumpfile into the repos, and check it out the repo + run_and_verify_load(sbox.repo_dir, dump_str) + run_and_verify_svn(None, None, [], "co", sbox.repo_url, sbox.wc_dir) + + return dump_str + + ###################################################################### # Subversion Actions # @@ -448,7 +484,8 @@ def run_and_verify_merge(dir, rev1, rev2 check_props = 0, dry_run = 1, *args): - """Run 'svn merge -rREV1:REV2 URL DIR'.""" + """Run 'svn merge -rREV1:REV2 URL DIR', leaving off the '-r' + argument if both REV1 and REV2 are None.""" if args: run_and_verify_merge2(dir, rev1, rev2, url, None, output_tree, disk_tree, status_tree, skip_tree, error_re_string, @@ -474,7 +511,8 @@ def run_and_verify_merge2(dir, rev1, rev """Run 'svn merge URL1@REV1 URL2@REV2 DIR' if URL2 is not None (for a three-way merge between URLs and WC). - If URL2 is None, run 'svn merge -rREV1:REV2 URL1 DIR'. + If URL2 is None, run 'svn merge -rREV1:REV2 URL1 DIR'. If both REV1 + and REV2 are None, leave off the '-r' argument. If ERROR_RE_STRING, the merge must exit with error, and the error message must match regular expression ERROR_RE_STRING. @@ -505,11 +543,15 @@ def run_and_verify_merge2(dir, rev1, rev if isinstance(skip_tree, wc.State): skip_tree = skip_tree.old_tree() + merge_command = [ "merge" ] if url2: - merge_command = ("merge", url1 + "@" + str(rev1),url2 + "@" + str(rev2), - dir) + merge_command.extend((url1 + "@" + str(rev1), url2 + "@" + str(rev2))) else: - merge_command = ("merge", "-r", str(rev1) + ":" + str(rev2), url1, dir) + if not (rev1 is None and rev2 is None): + merge_command.append("-r" + str(rev1) + ":" + str(rev2)) + merge_command.append(url1) + merge_command.append(dir) + merge_command = tuple(merge_command) if dry_run: pre_disk = tree.build_tree_from_wc(dir) @@ -872,17 +914,21 @@ def display_trees(message, label, expect tree.dump_tree(actual) -def display_lines(message, label, expected, actual, expected_is_regexp=None): +def display_lines(message, label, expected, actual, expected_is_regexp=None, + expected_is_unordered=None): """Print MESSAGE, unless it is None, then print EXPECTED (labeled with LABEL) followed by ACTUAL (also labeled with LABEL). Both EXPECTED and ACTUAL may be strings or lists of strings.""" if message is not None: print message if expected is not None: + output = 'EXPECTED %s' % label if expected_is_regexp: - print 'EXPECTED', label + ' (regexp):' - else: - print 'EXPECTED', label + ':' + output += ' (regexp)' + if expected_is_unordered: + output += ' (unordered)' + output += ':' + print output map(sys.stdout.write, expected) if expected_is_regexp: map(sys.stdout.write, '\n') @@ -903,6 +949,15 @@ def compare_and_display_lines(message, l display_lines(message, label, expected, actual) raise main.SVNLineUnequal +def compare_unordered_and_display_lines(message, label, expected, actual): + """Compare two sets of output lines, and print them if they differ, + but disregard the order of the lines. MESSAGE is ignored if None.""" + try: + main.compare_unordered_output(expected, actual) + except Failure: + display_lines(message, label, expected, actual, expected_is_unordered=True) + raise main.SVNLineUnequal + def match_or_fail(message, label, expected, actual): """Make sure that regexp EXPECTED matches at least one line in list ACTUAL. If no match, then print MESSAGE (if it's not None), followed by @@ -1000,6 +1055,18 @@ pre-revprop-change hook script and (if a hook_path = main.get_pre_revprop_change_hook_path (repo_dir) main.create_python_hook_script (hook_path, 'import sys; sys.exit(0)') +def disable_revprop_changes(repo_dir, message): + """Disable revprop changes in a repository REPO_DIR by creating a +pre-revprop-change hook script like enable_revprop_changes, except that +the hook prints MESSAGE to stderr and exits non-zero. MESSAGE is printed +very simply, and should have no newlines or quotes.""" + + hook_path = main.get_pre_revprop_change_hook_path (repo_dir) + main.create_python_hook_script (hook_path, + 'import sys\n' + 'sys.stderr.write("%s")\n' + 'sys.exit(1)\n' % (message,)) + def create_failing_post_commit_hook(repo_dir): """Disable commits in a repository REPOS_DIR by creating a post-commit hook script which always reports errors.""" @@ -1008,4 +1075,28 @@ script which always reports errors.""" main.create_python_hook_script (hook_path, 'import sys; ' 'sys.stderr.write("Post-commit hook failed"); ' 'sys.exit(1)') + +# set_prop can be used for binary properties are values like '*' which are not +# handled correctly when specified on the command line. +def set_prop(expected_err, name, value, path, valp): + """Set a property with value from a file""" + valf = open(valp, 'wb') + valf.seek(0) + valf.truncate(0) + valf.write(value) + valf.flush() + main.run_svn(expected_err, 'propset', '-F', valp, name, path) + +def check_prop(name, path, exp_out): + """Verify that property NAME on PATH has a value of EXP_OUT""" + # Not using run_svn because binary_mode must be set + out, err = main.run_command(main.svn_binary, None, 1, 'pg', '--strict', + name, path, '--config-dir', + main.default_config_dir) + if out != exp_out: + print "svn pg --strict", name, "output does not match expected." + print "Expected standard output: ", exp_out, "\n" + print "Actual standard output: ", out, "\n" + raise Failure + ### End of file. diff -purNbBwx .svn cvs2svn-1.5.x/svntest/main.py cvs2svn-2.0.0/svntest/main.py --- cvs2svn-1.5.x/svntest/main.py 2007-01-30 22:33:20.000000000 +0100 +++ cvs2svn-2.0.0/svntest/main.py 2007-08-15 22:54:10.000000000 +0200 @@ -16,13 +16,14 @@ ###################################################################### import sys # for argv[] -import os # for popen2() +import os import shutil # for rmtree() import re import stat # for ST_MODE import copy # for deepcopy() import time # for time() import traceback # for print_exc() +import threading import getopt try: @@ -58,12 +59,6 @@ from svntest import wc ##################################################################### # Global stuff -### Grandfather in SVNTreeUnequal, which used to live here. If you're -# ever feeling saucy, you could go through the testsuite and change -# main.SVNTreeUnequal to test.SVNTreeUnequal. -import tree -SVNTreeUnequal = tree.SVNTreeUnequal - class SVNProcessTerminatedBySignal(Failure): "Exception raised if a spawned process segfaulted, aborted, etc." pass @@ -88,30 +83,29 @@ class SVNRepositoryCreateFailure(Failure "Exception raised if unable to create a repository" pass +# Define True and False if not provided by Python (<=2.1) +try: + False +except: + False = 0 + True = 1 + # Windows specifics if sys.platform == 'win32': - windows = 1 + windows = True file_scheme_prefix = 'file:///' _exe = '.exe' else: - windows = 0 + windows = False file_scheme_prefix = 'file://' _exe = '' # os.wait() specifics try: from os import wait - platform_with_os_wait = 1 + platform_with_os_wait = True except ImportError: - platform_with_os_wait = 0 - -# The locations of the svn, svnadmin and svnlook binaries, relative to -# the only scripts that import this file right now (they live in ../). -svn_binary = os.path.abspath('../../svn/svn' + _exe) -svnadmin_binary = os.path.abspath('../../svnadmin/svnadmin' + _exe) -svnlook_binary = os.path.abspath('../../svnlook/svnlook' + _exe) -svnsync_binary = os.path.abspath('../../svnsync/svnsync' + _exe) -svnversion_binary = os.path.abspath('../../svnversion/svnversion' + _exe) + platform_with_os_wait = False # The location of our mock svneditor script. svneditor_script = os.path.join(sys.path[0], 'svneditor.py') @@ -124,27 +118,64 @@ wc_passwd = 'rayjandom' # scenarios wc_author2 = 'jconstant' # use the same password as wc_author +###################################################################### +# Global variables set during option parsing. These should not be used +# until the variable command_line_parsed has been set to True, as is +# done in run_tests below. +command_line_parsed = False + +# The locations of the svn, svnadmin and svnlook binaries, relative to +# the only scripts that import this file right now (they live in ../). +# Use --bin to override these defaults. +svn_binary = os.path.abspath('../../svn/svn' + _exe) +svnadmin_binary = os.path.abspath('../../svnadmin/svnadmin' + _exe) +svnlook_binary = os.path.abspath('../../svnlook/svnlook' + _exe) +svnsync_binary = os.path.abspath('../../svnsync/svnsync' + _exe) +svnversion_binary = os.path.abspath('../../svnversion/svnversion' + _exe) + # Global variable indicating if we want verbose output. -verbose_mode = 0 +verbose_mode = False # Global variable indicating if we want test data cleaned up after success -cleanup_mode = 0 +cleanup_mode = False # Global variable indicating if svnserve should use Cyrus SASL -enable_sasl = 0 +enable_sasl = False + +# Global variable indicating which DAV library, if any, is in use +# ('neon', 'serf') +http_library = None + +# Global variable indicating what the minor version of the server +# tested against is (4 for 1.4.x, for example). +server_minor_version = 5 + +# Global variable indicating if this is a child process and no cleanup +# of global directories is needed. +is_child_process = False # Global URL to testing area. Default to ra_local, current working dir. test_area_url = file_scheme_prefix + os.path.abspath(os.getcwd()) -if windows == 1: +if windows: test_area_url = test_area_url.replace('\\', '/') +# Location to the pristine repository, will be calculated from test_area_url +# when we know what the user specified for --url. +pristine_url = None + # Global variable indicating the FS type for repository creations. fs_type = None +# End of command-line-set global variables. +###################################################################### + # All temporary repositories and working copies are created underneath # this dir, so there's one point at which to mount, e.g., a ramdisk. work_dir = "svn-test-work" +# Constant for the merge info property. +SVN_PROP_MERGE_INFO = "svn:mergeinfo" + # Where we want all the repositories and working copies to live. # Each test will have its own! general_repo_dir = os.path.join(work_dir, "repositories") @@ -160,10 +191,6 @@ pristine_dir = os.path.join(temp_dir, "r greek_dump_dir = os.path.join(temp_dir, "greekfiles") default_config_dir = os.path.abspath(os.path.join(temp_dir, "config")) -# Location to the pristine repository, will be calculated from test_area_url -# when we know what the user specified for --url. -pristine_url = None - # # Our pristine greek-tree state. # @@ -243,35 +270,46 @@ def run_command(command, error_expected, return run_command_stdin(command, error_expected, binary_mode, None, *varargs) -# Run any binary, supplying input text, logging the command line -def run_command_stdin(command, error_expected, binary_mode=0, - stdin_lines=None, *varargs): - """Run COMMAND with VARARGS; input STDIN_LINES (a list of strings - which should include newline characters) to program via stdin - this - should not be very large, as if the program outputs more than the OS - is willing to buffer, this will deadlock, with both Python and - COMMAND waiting to write to each other for ever. - Return stdout, stderr as lists of lines. - If ERROR_EXPECTED is None, any stderr also will be printed.""" +# A regular expression that matches arguments that are trivially safe +# to pass on a command line without quoting on any supported operating +# system: +_safe_arg_re = re.compile(r'^[A-Za-z\d\.\_\/\-\:\@]+$') + +def _quote_arg(arg): + """Quote ARG for a command line. + + Simply surround every argument in double-quotes unless it contains + only universally harmless characters. + + WARNING: This function cannot handle arbitrary command-line + arguments. It can easily be confused by shell metacharacters. A + perfect job would be difficult and OS-dependent (see, for example, + http://msdn.microsoft.com/library/en-us/vccelng/htm/progs_12.asp). + In other words, this function is just good enough for what we need + here.""" - args = '' - for arg in varargs: # build the command string arg = str(arg) + if _safe_arg_re.match(arg): + return arg + else: if os.name != 'nt': arg = arg.replace('$', '\$') - args = args + ' "' + arg + '"' + return '"%s"' % (arg,) + +# Run any binary, supplying input text, logging the command line +def spawn_process(command, binary_mode=0,stdin_lines=None, *varargs): + args = ' '.join(map(_quote_arg, varargs)) # Log the command line if verbose_mode: - print 'CMD:', os.path.basename(command) + args, + print 'CMD:', os.path.basename(command) + ' ' + args, if binary_mode: mode = 'b' else: mode = 't' - start = time.time() - infile, outfile, errfile = os.popen3(command + args, mode) + infile, outfile, errfile = os.popen3(command + ' ' + args, mode) if stdin_lines: map(infile.write, stdin_lines) @@ -284,6 +322,8 @@ def run_command_stdin(command, error_exp outfile.close() errfile.close() + exit_code = 0 + if platform_with_os_wait: pid, wait_code = os.wait() @@ -295,6 +335,26 @@ def run_command_stdin(command, error_exp sys.stderr.write("".join(stderr_lines)) raise SVNProcessTerminatedBySignal + return exit_code, stdout_lines, stderr_lines + +def run_command_stdin(command, error_expected, binary_mode=0, + stdin_lines=None, *varargs): + """Run COMMAND with VARARGS; input STDIN_LINES (a list of strings + which should include newline characters) to program via stdin - this + should not be very large, as if the program outputs more than the OS + is willing to buffer, this will deadlock, with both Python and + COMMAND waiting to write to each other for ever. + Return stdout, stderr as lists of lines. + If ERROR_EXPECTED is None, any stderr also will be printed.""" + + if verbose_mode: + start = time.time() + + exit_code, stdout_lines, stderr_lines = spawn_process(command, + binary_mode, + stdin_lines, + *varargs) + if verbose_mode: stop = time.time() print '<TIME = %.6f>' % (stop - start) @@ -305,22 +365,44 @@ def run_command_stdin(command, error_exp return stdout_lines, stderr_lines -def create_config_dir(cfgdir, - config_contents = '#\n', - server_contents = '#\n'): +def create_config_dir(cfgdir, config_contents=None, server_contents=None): "Create config directories and files" # config file names cfgfile_cfg = os.path.join(cfgdir, 'config') - cfgfile_srv = os.path.join(cfgdir, 'server') + cfgfile_srv = os.path.join(cfgdir, 'servers') # create the directory if not os.path.isdir(cfgdir): os.makedirs(cfgdir) + # define default config file contents if none provided + if config_contents is None: + config_contents = """ +# +[miscellany] +interactive-conflicts = false +""" + + # define default server file contents if none provided + if server_contents is None: + if http_library: + server_contents = """ +# +[global] +http-library=%s +""" % (http_library) + else: + server_contents = "#\n" + file_write(cfgfile_cfg, config_contents) file_write(cfgfile_srv, server_contents) +def _with_config_dir(args): + if '--config-dir' in args: + return args + else: + return args + ('--config-dir', default_config_dir) # For running subversion and returning the output def run_svn(error_expected, *varargs): @@ -328,12 +410,7 @@ def run_svn(error_expected, *varargs): If ERROR_EXPECTED is None, any stderr also will be printed. If you're just checking that something does/doesn't come out of stdout/stderr, you might want to use actions.run_and_verify_svn().""" - if '--config-dir' in varargs: - return run_command(svn_binary, error_expected, 0, - *varargs) - else: - return run_command(svn_binary, error_expected, 0, - *varargs + ('--config-dir', default_config_dir)) + return run_command(svn_binary, error_expected, 0, *(_with_config_dir(varargs))) # For running svnadmin. Ignores the output. def run_svnadmin(*varargs): @@ -347,7 +424,7 @@ def run_svnlook(*varargs): def run_svnsync(*varargs): "Run svnsync with VARARGS, returns stdout, stderr as list of lines." - return run_command(svnsync_binary, 1, 0, *varargs) + return run_command(svnsync_binary, 1, 0, *(_with_config_dir(varargs))) def run_svnversion(*varargs): "Run svnversion with VARARGS, returns stdout, stderr as list of lines." @@ -404,6 +481,15 @@ def file_write(path, contents, mode = 'w fp.write(contents) fp.close() +# For reading the contents of a file +def file_read(path, mode = 'r'): + """Return the contents of the file at PATH, opening file using MODE, + which is (r)ead by default.""" + fp = open(path, mode) + contents = fp.read() + fp.close() + return contents + # For creating blank new repositories def create_repos(path): """Create a brand-new SVN repository at PATH. If PATH does not yet @@ -413,6 +499,8 @@ def create_repos(path): os.makedirs(path) # this creates all the intermediate dirs, if neccessary opts = ("--bdb-txn-nosync",) + if server_minor_version < 5: + opts += ("--pre-1.5-compatible",) if fs_type is not None: opts += ("--fs-type=" + fs_type,) stdout, stderr = run_command(svnadmin_binary, 1, 0, "create", path, *opts) @@ -429,7 +517,7 @@ def create_repos(path): # Allow unauthenticated users to write to the repos, for ra_svn testing. file_write(get_svnserve_conf_file_path(path), "[general]\nauth-access = write\n"); - if enable_sasl == 1: + if enable_sasl: file_append(get_svnserve_conf_file_path(path), "realm = svntest\n[sasl]\nuse-sasl = true\n") else: @@ -521,13 +609,13 @@ def create_python_hook_script (hook_path # Use an absolute path since the working directory is not guaranteed hook_path = os.path.abspath(hook_path) # Fill the python file. - file_append ("%s.py" % hook_path, hook_script_code) + file_write ("%s.py" % hook_path, hook_script_code) # Fill the batch wrapper file. file_append ("%s.bat" % hook_path, - "@\"%s\" %s.py\n" % (sys.executable, hook_path)) + "@\"%s\" %s.py %%*\n" % (sys.executable, hook_path)) else: # For all other platforms - file_append (hook_path, "#!%s\n%s" % (sys.executable, hook_script_code)) + file_write (hook_path, "#!%s\n%s" % (sys.executable, hook_script_code)) os.chmod (hook_path, 0755) @@ -545,27 +633,98 @@ def compare_unordered_output(expected, a except ValueError: raise Failure("Expected output does not match actual output") +def write_restrictive_svnserve_conf(repo_dir, anon_access="none"): + "Create a restrictive authz file ( no anynomous access )." + + fp = open(get_svnserve_conf_file_path(repo_dir), 'w') + fp.write("[general]\nanon-access = %s\nauth-access = write\n" + "authz-db = authz\n" % anon_access) + if enable_sasl == 1: + fp.write("realm = svntest\n[sasl]\nuse-sasl = true\n"); + else: + fp.write("password-db = passwd\n") + fp.close() + +def write_authz_file(sbox, rules, sections=None): + """Write an authz file to SBOX, appropriate for the RA method used, +with authorizations rules RULES mapping paths to strings containing +the rules. You can add sections SECTIONS (ex. groups, aliases...) with +an appropriate list of mappings. +""" + fp = open(sbox.authz_file, 'w') + if sbox.repo_url.startswith("http"): + prefix = sbox.name + ":" + else: + prefix = "" + if sections: + for p, r in sections.items(): + fp.write("[%s]\n%s\n" % (p, r)) + + for p, r in rules.items(): + fp.write("[%s%s]\n%s\n" % (prefix, p, r)) + fp.close() + def use_editor(func): os.environ['SVN_EDITOR'] = svneditor_script os.environ['SVNTEST_EDITOR_FUNC'] = func + +def merge_notify_line(revstart, revend=None): + """Return an expected output line that describes the beginning of a + merge operation on revisions REVSTART through REVEND.""" + if (revend is None): + if (revstart < 0): + return "--- Undoing r%ld:\n" % abs(revstart) + else: + return "--- Merging r%ld:\n" % revstart + else: + if (revstart > revend): + return "--- Undoing r%ld through r%ld:\n" % (revstart, revend) + else: + return "--- Merging r%ld through r%ld:\n" % (revstart, revend) + + ###################################################################### # Functions which check the test configuration # (useful for conditional XFails) +def _check_command_line_parsed(): + """Raise an exception if the command line has not yet been parsed.""" + if not command_line_parsed: + raise Failure("Condition cannot be tested until command line is parsed") + def is_ra_type_dav(): + _check_command_line_parsed() return test_area_url.startswith('http') def is_ra_type_svn(): + _check_command_line_parsed() return test_area_url.startswith('svn') +def is_ra_type_file(): + _check_command_line_parsed() + return test_area_url.startswith('file') + def is_fs_type_fsfs(): + _check_command_line_parsed() # This assumes that fsfs is the default fs implementation. return (fs_type == 'fsfs' or fs_type is None) def is_os_windows(): return (os.name == 'nt') +def is_posix_os(): + return (os.name == 'posix') + +def server_has_mergeinfo(): + _check_command_line_parsed() + return server_minor_version >= 5 + +def server_has_revprop_commit(): + _check_command_line_parsed() + return server_minor_version >= 5 + + ###################################################################### # Sandbox handling @@ -602,7 +761,7 @@ class Sandbox: elif self.repo_url.startswith("svn"): self.authz_file = os.path.join(self.repo_dir, "conf", "authz") - if windows == 1: + if windows: self.repo_url = self.repo_url.replace('\\', '/') self.test_paths = [self.wc_dir, self.repo_dir] @@ -670,6 +829,44 @@ def _cleanup_test_path(path, retrying=No print "WARNING: cleanup failed, will try again later" _deferred_test_paths.append(path) +class SpawnTest(threading.Thread): + """Encapsulate a single test case, run it in a separate child process. + Instead of waiting till the process is finished, add this class to a + list of active tests for follow up in the parent process.""" + def __init__(self, index, tests = None): + threading.Thread.__init__(self) + self.index = index + self.tests = tests + self.result = None + self.stdout_lines = None + self.stderr_lines = None + + def run(self): + command = sys.argv[0] + + args = [] + args.append(str(self.index)) + args.append('-c') + # add some startup arguments from this process + if fs_type: + args.append('--fs-type=' + fs_type) + if test_area_url: + args.append('--url=' + test_area_url) + if verbose_mode: + args.append('-v') + if cleanup_mode: + args.append('--cleanup') + if enable_sasl: + args.append('--enable-sasl') + + self.result, self.stdout_lines, self.stderr_lines =\ + spawn_process(command, 1, None, *args) + # don't trust the exitcode, will not be correct on Windows + if filter(lambda x: x.startswith('FAIL: ') or x.startswith('XPASS: '), + self.stdout_lines): + self.result = 1 + self.tests.append(self) + sys.stdout.write('.') class TestRunner: """Encapsulate a single test case (predicate), including logic for @@ -709,7 +906,9 @@ class TestRunner: # Tests that want to use an editor should invoke svntest.main.use_editor. os.environ['SVN_EDITOR'] = '' os.environ['SVNTEST_EDITOR_FUNC'] = '' + actions.no_sleep_for_timestamps() + saved_dir = os.getcwd() try: rc = apply(self.pred.run, (), kw) if rc is not None: @@ -744,6 +943,8 @@ class TestRunner: result = 1 print 'UNEXPECTED EXCEPTION:' traceback.print_exc(file=sys.stdout) + + os.chdir(saved_dir) result = self.pred.convert_result(result) print self.pred.run_text(result), self._print_name() @@ -763,26 +963,71 @@ class TestRunner: # it can be displayed by the 'list' command.) # Func to run one test in the list. -def run_one_test(n, test_list): - "Run the Nth client test in TEST_LIST, return the result." +def run_one_test(n, test_list, parallel = 0, finished_tests = None): + """Run the Nth client test in TEST_LIST, return the result. + + If we're running the tests in parallel spawn the test in a new process. + """ if (n < 1) or (n > len(test_list) - 1): print "There is no test", `n` + ".\n" return 1 # Run the test. + if parallel: + st = SpawnTest(n, finished_tests) + st.start() + return 0 + else: exit_code = TestRunner(test_list[n], n).run() return exit_code -def _internal_run_tests(test_list, testnums): - """Run the tests from TEST_LIST whose indices are listed in TESTNUMS.""" +def _internal_run_tests(test_list, testnums, parallel): + """Run the tests from TEST_LIST whose indices are listed in TESTNUMS. + + If we're running the tests in parallel spawn as much parallel processes + as requested and gather the results in a temp. buffer when a child + process is finished. + """ exit_code = 0 + finished_tests = [] + tests_started = 0 + if not parallel: for testnum in testnums: - # 1 is the only return code that indicates actual test failure. if run_one_test(testnum, test_list) == 1: exit_code = 1 + else: + for testnum in testnums: + # wait till there's a free spot. + while tests_started - len(finished_tests) > parallel: + time.sleep(0.2) + run_one_test(testnum, test_list, parallel, finished_tests) + tests_started += 1 + + # wait for all tests to finish + while len(finished_tests) < len(testnums): + time.sleep(0.2) + + # Sort test results list by test nr. + deco = [(test.index, test) for test in finished_tests] + deco.sort() + finished_tests = [test for (ti, test) in deco] + + # terminate the line of dots + print + + # all tests are finished, find out the result and print the logs. + for test in finished_tests: + if test.stdout_lines: + for line in test.stdout_lines: + sys.stdout.write(line) + if test.stderr_lines: + for line in test.stderr_lines: + sys.stdout.write(line) + if test.result == 1: + exit_code = 1 _cleanup_deferred_test_paths() return exit_code @@ -792,7 +1037,7 @@ def usage(): prog_name = os.path.basename(sys.argv[0]) print "%s [--url] [--fs-type] [--verbose] [--enable-sasl] [--cleanup] \\" \ % prog_name - print "%s [<test> ...]" % (" " * len(prog_name)) + print "%s [--bin] [<test> ...]" % (" " * len(prog_name)) print "%s [--list] [<test> ...]\n" % prog_name print "Arguments:" print " test The number of the test to run (multiple okay), " \ @@ -800,10 +1045,13 @@ def usage(): print "Options:" print " --list Print test doc strings instead of running them" print " --fs-type Subversion file system type (fsfs or bdb)" + print " --http-library DAV library to use (neon or serf)" print " --url Base url to the repos (e.g. svn://localhost)" print " --verbose Print binary command-lines" print " --cleanup Whether to clean up" print " --enable-sasl Whether to enable SASL authentication" + print " --parallel Run the tests in parallel" + print " --bin Use the svn binaries installed in this path" print " --help This information" @@ -811,7 +1059,7 @@ def usage(): # to run their list of tests. # # This routine parses sys.argv to decide what to do. -def run_tests(test_list): +def run_tests(test_list, serial_only = False): """Main routine to run all tests in TEST_LIST. NOTE: this function does not return. It does a sys.exit() with the @@ -824,18 +1072,31 @@ def run_tests(test_list): global verbose_mode global cleanup_mode global enable_sasl + global is_child_process + global svn_binary + global svnadmin_binary + global svnlook_binary + global svnsync_binary + global svnversion_binary + global command_line_parsed + global http_library + global server_minor_version + testnums = [] # Should the tests be listed (as opposed to executed)? - list_tests = 0 + list_tests = False - opts, args = my_getopt(sys.argv[1:], 'vh', + parallel = 0 + svn_bin = None + opts, args = my_getopt(sys.argv[1:], 'vhpc', ['url=', 'fs-type=', 'verbose', 'cleanup', 'list', - 'enable-sasl', 'help']) + 'enable-sasl', 'help', 'parallel', 'bin=', + 'http-library=', 'server-minor-version=']) for arg in args: if arg == "list": # This is an old deprecated variant of the "--list" option: - list_tests = 1 + list_tests = True elif arg.startswith('BASE_URL='): test_area_url = arg[9:] else: @@ -854,29 +1115,63 @@ def run_tests(test_list): fs_type = val elif opt == "-v" or opt == "--verbose": - verbose_mode = 1 + verbose_mode = True elif opt == "--cleanup": - cleanup_mode = 1 + cleanup_mode = True elif opt == "--list": - list_tests = 1 + list_tests = True elif opt == "--enable-sasl": - enable_sasl = 1 + enable_sasl = True elif opt == "-h" or opt == "--help": usage() sys.exit(0) + elif opt == '-p' or opt == "--parallel": + parallel = 5 # use 5 parallel threads. + + elif opt == '-c': + is_child_process = True + + elif opt == '--bin': + svn_bin = val + + elif opt == '--http-library': + http_library = val + + elif opt == '--server-minor-version': + server_minor_version = int(val) + if server_minor_version < 4 or server_minor_version > 6: + print "ERROR: test harness only supports server minor version 4 or 5" + sys.exit(1) + if test_area_url[-1:] == '/': # Normalize url to have no trailing slash test_area_url = test_area_url[:-1] # Calculate pristine_url from test_area_url. pristine_url = test_area_url + '/' + pristine_dir - if windows == 1: + if windows: pristine_url = pristine_url.replace('\\', '/') + if not svn_bin is None: + svn_binary = os.path.join(svn_bin, 'svn' + _exe) + svnadmin_binary = os.path.join(svn_bin, 'svnadmin' + _exe) + svnlook_binary = os.path.join(svn_bin, 'svnlook' + _exe) + svnsync_binary = os.path.join(svn_bin, 'svnsync' + _exe) + svnversion_binary = os.path.join(svn_bin, 'svnversion' + _exe) + + command_line_parsed = True + + ###################################################################### + + # Cleanup: if a previous run crashed or interrupted the python + # interpreter, then `temp_dir' was never removed. This can cause wonkiness. + if not is_child_process: + safe_rmtree(temp_dir) + if not testnums: # If no test numbers were listed explicitly, include all of them: testnums = range(1, len(test_list)) @@ -890,14 +1185,23 @@ def run_tests(test_list): # done. just exit with success. sys.exit(0) - # Setup the pristine repository (and working copy) + # don't run tests in parallel when the tests don't support it or there + # are only a few tests to run. + if serial_only or len(testnums) < 2: + parallel = 0 + + # Setup the pristine repository actions.setup_pristine_repository() + # Build out the default configuration directory + create_config_dir(default_config_dir) + # Run the tests. - exit_code = _internal_run_tests(test_list, testnums) + exit_code = _internal_run_tests(test_list, testnums, parallel) # Remove all scratchwork: the 'pristine' repository, greek tree, etc. # This ensures that an 'import' will happen the next time we run. + if not is_child_process: safe_rmtree(temp_dir) # Cleanup after ourselves. @@ -906,15 +1210,6 @@ def run_tests(test_list): # Return the appropriate exit code from the tests. sys.exit(exit_code) - -###################################################################### -# Initialization - -# Cleanup: if a previous run crashed or interrupted the python -# interpreter, then `temp_dir' was never removed. This can cause wonkiness. - -safe_rmtree(temp_dir) - # the modules import each other, so we do this import very late, to ensure # that the definitions in "main" have been completed. import actions diff -purNbBwx .svn cvs2svn-1.5.x/svntest/testcase.py cvs2svn-2.0.0/svntest/testcase.py --- cvs2svn-1.5.x/svntest/testcase.py 2006-11-30 04:35:58.000000000 +0100 +++ cvs2svn-2.0.0/svntest/testcase.py 2007-08-15 22:54:10.000000000 +0200 @@ -157,33 +157,61 @@ class XFail(TestCase): class Skip(TestCase): - """A test that will be skipped if condition COND is true.""" + """A test that will be skipped if its conditional is true.""" + + def __init__(self, test_case, cond_func=lambda:1): + """Create an Skip instance based on TEST_CASE. COND_FUNC is a + callable that is evaluated at test run time and should return a + boolean value. If COND_FUNC returns true, then TEST_CASE is + skipped; otherwise, TEST_CASE is run normally. + The evaluation of COND_FUNC is deferred so that it can base its + decision on useful bits of information that are not available at + __init__ time (like the fact that we're running over a + particular RA layer).""" - def __init__(self, test_case, cond=1): TestCase.__init__(self) self.test_case = create_test_case(test_case) - self.cond = cond - if self.cond: + self.cond_func = cond_func + try: + if self.conditional(): self._list_mode_text = 'SKIP' + except svntest.Failure: + pass # Delegate most methods to self.test_case: self.get_description = self.test_case.get_description self.get_sandbox_name = self.test_case.get_sandbox_name self.convert_result = self.test_case.convert_result def need_sandbox(self): - if self.cond: + if self.conditional(): return 0 else: return self.test_case.need_sandbox() def run(self, sandbox=None): - if self.cond: + if self.conditional(): raise svntest.Skip elif self.need_sandbox(): return self.test_case.run(sandbox=sandbox) else: return self.test_case.run() + def conditional(self): + """Invoke SELF.cond_func(), and return the result evaluated + against the expected value.""" + return self.cond_func() + + +class SkipUnless(Skip): + """A test that will be skipped if its conditional is false.""" + + def __init__(self, test_case, cond_func): + Skip.__init__(self, test_case, cond_func) + + def conditional(self): + "Return the negation of SELF.cond_func()." + return not self.cond_func() + def create_test_case(func): if isinstance(func, TestCase): diff -purNbBwx .svn cvs2svn-1.5.x/svntest/wc.py cvs2svn-2.0.0/svntest/wc.py --- cvs2svn-1.5.x/svntest/wc.py 2006-09-21 23:01:11.000000000 +0200 +++ cvs2svn-2.0.0/svntest/wc.py 2007-08-15 22:54:10.000000000 +0200 @@ -67,7 +67,12 @@ class State: """ if args: for path in args: - apply(self.desc[path].tweak, (), kw) + try: + path_ref = self.desc[path] + except KeyError, e: + e.args = "Path '%s' not present in WC state descriptor" % path + raise + apply(path_ref.tweak, (), kw) else: for item in self.desc.values(): apply(item.tweak, (), kw) diff -purNbBwx .svn cvs2svn-1.5.x/test-data/add-on-branch-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/add-on-branch-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/add-on-branch-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/add-on-branch-cvsrepos/CVSROOT/README 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/add-on-branch-cvsrepos/makerepo.sh cvs2svn-2.0.0/test-data/add-on-branch-cvsrepos/makerepo.sh --- cvs2svn-1.5.x/test-data/add-on-branch-cvsrepos/makerepo.sh 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/add-on-branch-cvsrepos/makerepo.sh 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,127 @@ +#! /bin/sh + +# This is the script used to create the add-on-branch CVS repository. +# (The repository is checked into svn; this script is only here for +# its documentation value.) The script should be started from the +# main cvs2svn directory. + +# The output of this script depends on the CVS version. Newer CVS +# versions add dead revisions (b.txt:1.1.2.1 and c.txt:1.2.2.1) on the +# branch, presumably to indicate that the file didn't exist on the +# branch during the period of time between the branching point and +# when the 1.x.2.2 revisions were committed. Older versions of CVS do +# not add these extra revisions. The point of this test is to handle +# the new CVS behavior, so set this variable to point at a newish CVS +# executable: + +cvs=$HOME/download/cvs-1.11.21/src/cvs + +name=add-on-branch +repo=`pwd`/test-data/$name-cvsrepos +wc=`pwd`/cvs2svn-tmp/$name-wc +[ -e $repo/CVSROOT ] && rm -rf $repo/CVSROOT +[ -e $repo/proj ] && rm -rf $repo/proj +[ -e $wc ] && rm -rf $wc + +$cvs -d $repo init +$cvs -d $repo co -d $wc . +cd $wc +mkdir proj +$cvs add proj +cd $wc/proj + +echo "Create a file a.txt on trunk:" +echo '1.1' >a.txt +$cvs add a.txt +$cvs commit -m 'Adding a.txt:1.1' . + +echo "Create BRANCH1 on file a.txt:" +$cvs tag -b BRANCH1 + +echo "Create BRANCH2 on file a.txt:" +$cvs tag -b BRANCH2 + +echo "Create BRANCH3 on file a.txt:" +$cvs tag -b BRANCH3 + + + +f=b.txt +b=BRANCH1 + +echo "Add file $f on trunk:" +$cvs up -A +echo "1.1" >$f +$cvs add $f +$cvs commit -m "Adding $f:1.1" + + +echo "Add file $f on $b:" +$cvs up -r $b + +# Ensure that times are distinct: +sleep 2 +echo "1.1.2.2" >$f +$cvs add $f +$cvs commit -m "Adding $f:1.1.2.2" + + + +f=c.txt +b=BRANCH2 + +echo "Add file $f on trunk:" +$cvs up -A +echo "1.1" >$f +$cvs add $f +$cvs commit -m "Adding $f:1.1" + + +echo "Delete file $f on trunk:" +rm $f +$cvs remove $f +$cvs commit -m "Removing $f:1.2" + + +echo "Add file $f on $b:" +$cvs up -r $b + +# Ensure that times are distinct: +sleep 2 +echo "1.2.2.2" >$f +$cvs add $f +$cvs commit -m "Adding $f:1.2.2.2" + + + +f=d.txt +b=BRANCH3 + +echo "Add file $f on trunk:" +$cvs up -A +echo "1.1" >$f +$cvs add $f +$cvs commit -m "Adding $f:1.1" + + +echo "Add file $f on $b:" +$cvs up -r $b + +# Ensure that times are distinct: +sleep 2 +echo "1.1.2.2" >$f +$cvs add $f +$cvs commit -m "Adding $f:1.1.2.2" + + +echo "Modify file $f on trunk:" +$cvs up -A +echo "1.2" >$f +$cvs commit -m "Changing $f:1.2" + + +# Erase the unneeded stuff out of CVSROOT: +rm -rf $repo/CVSROOT +mkdir $repo/CVSROOT + + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/add-on-branch-cvsrepos/proj/Attic/c.txt,v cvs2svn-2.0.0/test-data/add-on-branch-cvsrepos/proj/Attic/c.txt,v --- cvs2svn-1.5.x/test-data/add-on-branch-cvsrepos/proj/Attic/c.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/add-on-branch-cvsrepos/proj/Attic/c.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,69 @@ +head 1.2; +access; +symbols + BRANCH2:1.2.0.2; +locks; strict; +comment @# @; + + +1.2 +date 2007.07.05.18.27.28; author mhagger; state dead; +branches + 1.2.2.1; +next 1.1; + +1.1 +date 2007.07.05.18.27.27; author mhagger; state Exp; +branches; +next ; + +1.2.2.1 +date 2007.07.05.18.27.28; author mhagger; state dead; +branches; +next 1.2.2.2; + +1.2.2.2 +date 2007.07.05.18.27.30; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@Removing c.txt:1.2 +@ +text +@1.1 +@ + + +1.2.2.1 +log +@file c.txt was added on branch BRANCH2 on 2007-07-05 18:27:30 +0000 +@ +text +@d1 1 +@ + + +1.2.2.2 +log +@Adding c.txt:1.2.2.2 +@ +text +@a0 1 +1.2.2.2 +@ + + +1.1 +log +@Adding c.txt:1.1 +@ +text +@@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/add-on-branch-cvsrepos/proj/a.txt,v cvs2svn-2.0.0/test-data/add-on-branch-cvsrepos/proj/a.txt,v --- cvs2svn-1.5.x/test-data/add-on-branch-cvsrepos/proj/a.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/add-on-branch-cvsrepos/proj/a.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,27 @@ +head 1.1; +access; +symbols + BRANCH3:1.1.0.6 + BRANCH2:1.1.0.4 + BRANCH1:1.1.0.2; +locks; strict; +comment @# @; + + +1.1 +date 2007.07.05.18.27.21; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@Adding a.txt:1.1 +@ +text +@1.1 +@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/add-on-branch-cvsrepos/proj/b.txt,v cvs2svn-2.0.0/test-data/add-on-branch-cvsrepos/proj/b.txt,v --- cvs2svn-1.5.x/test-data/add-on-branch-cvsrepos/proj/b.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/add-on-branch-cvsrepos/proj/b.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,57 @@ +head 1.1; +access; +symbols + BRANCH1:1.1.0.2; +locks; strict; +comment @# @; + + +1.1 +date 2007.07.05.18.27.22; author mhagger; state Exp; +branches + 1.1.2.1; +next ; + +1.1.2.1 +date 2007.07.05.18.27.22; author mhagger; state dead; +branches; +next 1.1.2.2; + +1.1.2.2 +date 2007.07.05.18.27.25; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@Adding b.txt:1.1 +@ +text +@1.1 +@ + + +1.1.2.1 +log +@file b.txt was added on branch BRANCH1 on 2007-07-05 18:27:25 +0000 +@ +text +@d1 1 +@ + + +1.1.2.2 +log +@Adding b.txt:1.1.2.2 +@ +text +@a0 1 +1.1.2.2 +@ + + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/add-on-branch-cvsrepos/proj/d.txt,v cvs2svn-2.0.0/test-data/add-on-branch-cvsrepos/proj/d.txt,v --- cvs2svn-1.5.x/test-data/add-on-branch-cvsrepos/proj/d.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/add-on-branch-cvsrepos/proj/d.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,73 @@ +head 1.2; +access; +symbols + BRANCH3:1.1.0.2; +locks; strict; +comment @# @; + + +1.2 +date 2007.07.05.20.37.52; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2007.07.05.18.27.32; author mhagger; state Exp; +branches + 1.1.2.1; +next ; + +1.1.2.1 +date 2007.07.05.18.27.32; author mhagger; state dead; +branches; +next 1.1.2.2; + +1.1.2.2 +date 2007.07.05.18.27.35; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@Changing d.txt:1.2 +@ +text +@1.2 +@ + + +1.1 +log +@Adding d.txt:1.1 +@ +text +@d1 1 +a1 1 +1.1 +@ + + +1.1.2.1 +log +@file d.txt was added on branch BRANCH3 on 2007-07-05 18:27:35 +0000 +@ +text +@d1 1 +@ + + +1.1.2.2 +log +@Adding d.txt:1.1.2.2 +@ +text +@a0 1 +1.1.2.2 +@ + + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/attic-directory-conflict-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/attic-directory-conflict-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/attic-directory-conflict-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/attic-directory-conflict-cvsrepos/CVSROOT/README 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/attic-directory-conflict-cvsrepos/proj/Attic/file1,v cvs2svn-2.0.0/test-data/attic-directory-conflict-cvsrepos/proj/Attic/file1,v --- cvs2svn-1.5.x/test-data/attic-directory-conflict-cvsrepos/proj/Attic/file1,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/attic-directory-conflict-cvsrepos/proj/Attic/file1,v 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,37 @@ +head 1.2; +access; +symbols; +locks; strict; +comment @# @; + + +1.2 +date 2007.04.08.10.09.19; author mhagger; state dead; +branches; +next 1.1; + +1.1 +date 2007.04.08.08.09.02; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@*** empty log message *** +@ +text +@@ + + +1.1 +log +@*** empty log message *** +@ +text +@@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/attic-directory-conflict-cvsrepos/proj/file1/file2.txt,v cvs2svn-2.0.0/test-data/attic-directory-conflict-cvsrepos/proj/file1/file2.txt,v --- cvs2svn-1.5.x/test-data/attic-directory-conflict-cvsrepos/proj/file1/file2.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/attic-directory-conflict-cvsrepos/proj/file1/file2.txt,v 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,23 @@ +head 1.1; +access; +symbols; +locks; strict; +comment @# @; + + +1.1 +date 2007.04.08.08.10.10; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@*** empty log message *** +@ +text +@@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/bogus-branch-copy-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/bogus-branch-copy-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/bogus-branch-copy-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/bogus-branch-copy-cvsrepos/CVSROOT/README 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/bogus-branch-copy-cvsrepos/file2.txt,v cvs2svn-2.0.0/test-data/bogus-branch-copy-cvsrepos/file2.txt,v --- cvs2svn-1.5.x/test-data/bogus-branch-copy-cvsrepos/file2.txt,v 2005-08-15 08:15:51.000000000 +0200 +++ cvs2svn-2.0.0/test-data/bogus-branch-copy-cvsrepos/file2.txt,v 2007-08-15 22:53:51.000000000 +0200 @@ -60,7 +60,7 @@ text 1.3.2.1 log -@file file2.txt.py was added on branch branch-boom on 2005-04-01 23:13:40 +0000 +@file file2.txt was added on branch branch-boom on 2005-04-01 23:13:40 +0000 @ text @@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/branch-from-default-branch-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/branch-from-default-branch-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/branch-from-default-branch-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/branch-from-default-branch-cvsrepos/CVSROOT/README 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/branch-from-default-branch-cvsrepos/proj/file.txt,v cvs2svn-2.0.0/test-data/branch-from-default-branch-cvsrepos/proj/file.txt,v --- cvs2svn-1.5.x/test-data/branch-from-default-branch-cvsrepos/proj/file.txt,v 2004-07-10 00:04:57.000000000 +0200 +++ cvs2svn-2.0.0/test-data/branch-from-default-branch-cvsrepos/proj/file.txt,v 2007-08-15 22:53:51.000000000 +0200 @@ -1,5 +1,4 @@ head 1.2; -branch 1.1.1; access; symbols branch-off-of-default-branch:1.1.1.2.0.2 @@ -45,9 +44,7 @@ log @This is a log message. @ text -@d1 1 -a1 1 -This is revision 1.2 of file.txt +@This is revision 1.2 of file.txt @ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/branch-from-deleted-1-1-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/branch-from-deleted-1-1-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/branch-from-deleted-1-1-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/branch-from-deleted-1-1-cvsrepos/CVSROOT/README 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/branch-from-deleted-1-1-cvsrepos/makerepo.sh cvs2svn-2.0.0/test-data/branch-from-deleted-1-1-cvsrepos/makerepo.sh --- cvs2svn-1.5.x/test-data/branch-from-deleted-1-1-cvsrepos/makerepo.sh 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/branch-from-deleted-1-1-cvsrepos/makerepo.sh 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,80 @@ +#! /bin/sh + +# This is the script used to create the branch-from-deleted-1-1 CVS +# repository. (The repository is checked into svn; this script is +# only here for its documentation value.) +# +# The script should be started from the main cvs2svn directory. + +name=branch-from-deleted-1-1 +repo=`pwd`/test-data/$name-cvsrepos +wc=`pwd`/cvs2svn-tmp/$name-wc +[ -e $repo/CVSROOT ] && rm -rf $repo/CVSROOT +[ -e $repo/proj ] && rm -rf $repo/proj +[ -e $wc ] && rm -rf $wc + +cvs -d $repo init +cvs -d $repo co -d $wc . +cd $wc +mkdir proj +cvs add proj +cd proj + +echo "Create a file a.txt on trunk:" +echo '1.1' >a.txt +cvs add a.txt +cvs commit -m 'Adding a.txt:1.1' . + +echo "Create two branches on file a.txt:" +cvs tag -b BRANCH1 +cvs tag -b BRANCH2 + + +echo "Add file b.txt on BRANCH1:" +cvs up -r BRANCH1 + +echo '1.1.2.1' >b.txt +cvs add b.txt +cvs commit -m 'Adding b.txt:1.1.2.1' + + +echo "Add file b.txt on BRANCH2:" +cvs up -r BRANCH2 + +echo '1.1.4.1' >b.txt +cvs add b.txt +cvs commit -m 'Adding b.txt:1.1.4.1' + + +echo "Add file b.txt on trunk:" +cvs up -A +echo '1.2' >b.txt +cvs add b.txt +cvs commit -m 'Adding b.txt:1.2' + + + +echo "Add file c.txt on BRANCH1:" +cvs up -r BRANCH1 + +echo '1.1.2.1' >c.txt +cvs add c.txt +cvs commit -m 'Adding c.txt:1.1.2.1' + + +echo "Add file c.txt on BRANCH2:" +cvs up -r BRANCH2 + +echo '1.1.4.1' >c.txt +cvs add c.txt +cvs commit -m 'Adding c.txt:1.1.4.1' + + + +echo "Create branch BRANCH3 from 1.1 versions of b.txt and c.txt:" +cvs rtag -r 1.1 -b BRANCH3 proj/b.txt proj/c.txt + +echo "Create tag TAG1 from 1.1 versions of b.txt and c.txt:" +cvs rtag -r 1.1 TAG1 proj/b.txt proj/c.txt + + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/branch-from-deleted-1-1-cvsrepos/proj/Attic/c.txt,v cvs2svn-2.0.0/test-data/branch-from-deleted-1-1-cvsrepos/proj/Attic/c.txt,v --- cvs2svn-1.5.x/test-data/branch-from-deleted-1-1-cvsrepos/proj/Attic/c.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/branch-from-deleted-1-1-cvsrepos/proj/Attic/c.txt,v 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,60 @@ +head 1.1; +access; +symbols + TAG1:1.1 + BRANCH3:1.1.0.6 + BRANCH2:1.1.0.4 + BRANCH1:1.1.0.2; +locks; strict; +comment @# @; + + +1.1 +date 2007.06.25.22.20.19; author mhagger; state dead; +branches + 1.1.2.1 + 1.1.4.1; +next ; + +1.1.2.1 +date 2007.06.25.22.20.19; author mhagger; state Exp; +branches; +next ; + +1.1.4.1 +date 2007.06.25.22.20.21; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@file c.txt was initially added on branch BRANCH1. +@ +text +@@ + + +1.1.4.1 +log +@Adding c.txt:1.1.4.1 +@ +text +@a0 1 +1.1.4.1 +@ + + +1.1.2.1 +log +@Adding c.txt:1.1.2.1 +@ +text +@a0 1 +1.1.2.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/branch-from-deleted-1-1-cvsrepos/proj/a.txt,v cvs2svn-2.0.0/test-data/branch-from-deleted-1-1-cvsrepos/proj/a.txt,v --- cvs2svn-1.5.x/test-data/branch-from-deleted-1-1-cvsrepos/proj/a.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/branch-from-deleted-1-1-cvsrepos/proj/a.txt,v 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,26 @@ +head 1.1; +access; +symbols + BRANCH2:1.1.0.4 + BRANCH1:1.1.0.2; +locks; strict; +comment @# @; + + +1.1 +date 2007.06.25.22.20.14; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@Adding a.txt:1.1 +@ +text +@1.1 +@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/branch-from-deleted-1-1-cvsrepos/proj/b.txt,v cvs2svn-2.0.0/test-data/branch-from-deleted-1-1-cvsrepos/proj/b.txt,v --- cvs2svn-1.5.x/test-data/branch-from-deleted-1-1-cvsrepos/proj/b.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/branch-from-deleted-1-1-cvsrepos/proj/b.txt,v 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,75 @@ +head 1.2; +access; +symbols + TAG1:1.1 + BRANCH3:1.1.0.6 + BRANCH2:1.1.0.4 + BRANCH1:1.1.0.2; +locks; strict; +comment @# @; + + +1.2 +date 2007.06.25.22.20.17; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2007.06.25.22.20.15; author mhagger; state dead; +branches + 1.1.2.1 + 1.1.4.1; +next ; + +1.1.2.1 +date 2007.06.25.22.20.15; author mhagger; state Exp; +branches; +next ; + +1.1.4.1 +date 2007.06.25.22.20.16; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@Adding b.txt:1.2 +@ +text +@1.2 +@ + + +1.1 +log +@file b.txt was initially added on branch BRANCH1. +@ +text +@d1 1 +@ + + +1.1.4.1 +log +@Adding b.txt:1.1.4.1 +@ +text +@a0 1 +1.1.4.1 +@ + + +1.1.2.1 +log +@Adding b.txt:1.1.2.1 +@ +text +@a0 1 +1.1.2.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/branch-from-empty-dir-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/branch-from-empty-dir-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/branch-from-empty-dir-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/branch-from-empty-dir-cvsrepos/CVSROOT/README 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/branch-from-empty-dir-cvsrepos/makerepo.sh cvs2svn-2.0.0/test-data/branch-from-empty-dir-cvsrepos/makerepo.sh --- cvs2svn-1.5.x/test-data/branch-from-empty-dir-cvsrepos/makerepo.sh 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/branch-from-empty-dir-cvsrepos/makerepo.sh 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,53 @@ +#! /bin/sh + +# This is the script used to create the branch-from-empty-dir CVS +# repository. (The repository is checked into svn; this script is +# only here for its documentation value.) +# +# The script should be started from the main cvs2svn directory. +# +# The repository itself tickles a problem that I was having with an +# uncommitted version of better-symbol-selection when BRANCH2 is +# grafted onto BRANCH1. + +name=branch-from-empty-dir +repo=`pwd`/test-data/$name-cvsrepos +wc=`pwd`/cvs2svn-tmp/$name-wc +[ -e $repo/CVSROOT ] && rm -rf $repo/CVSROOT +[ -e $repo/proj ] && rm -rf $repo/proj +[ -e $wc ] && rm -rf $wc + +cvs -d $repo init +cvs -d $repo co -d $wc . +cd $wc +mkdir proj +cvs add proj +cd proj +mkdir subdir +cvs add subdir +echo '1.1' >subdir/b.txt +cvs add subdir/b.txt +cvs commit -m 'Adding subdir/b.txt:1.1' . + + +rm subdir/b.txt +cvs rm subdir/b.txt +cvs commit -m 'Removing subdir/b.txt' . + + +cvs rtag -r 1.2 -b BRANCH1 proj/subdir/b.txt +cvs rtag -r 1.2 -b BRANCH2 proj/subdir/b.txt + + +echo '1.1' >a.txt +cvs add a.txt +cvs commit -m 'Adding a.txt:1.1' . + +cvs tag -b BRANCH1 a.txt +cvs update -r BRANCH1 + +echo '1.1.2.1' >a.txt +cvs commit -m 'Committing a.txt:1.1.2.1' a.txt + +cvs tag -b BRANCH2 a.txt + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/branch-from-empty-dir-cvsrepos/proj/a.txt,v cvs2svn-2.0.0/test-data/branch-from-empty-dir-cvsrepos/proj/a.txt,v --- cvs2svn-1.5.x/test-data/branch-from-empty-dir-cvsrepos/proj/a.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/branch-from-empty-dir-cvsrepos/proj/a.txt,v 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,44 @@ +head 1.1; +access; +symbols + BRANCH2:1.1.2.1.0.2 + BRANCH1:1.1.0.2; +locks; strict; +comment @# @; + + +1.1 +date 2007.04.30.15.20.32; author mhagger; state Exp; +branches + 1.1.2.1; +next ; + +1.1.2.1 +date 2007.04.30.15.20.33; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@Adding a.txt:1.1 +@ +text +@1.1 +@ + + +1.1.2.1 +log +@Committing a.txt:1.1.2.1 +@ +text +@d1 1 +a1 1 +1.1.2.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/branch-from-empty-dir-cvsrepos/proj/subdir/Attic/b.txt,v cvs2svn-2.0.0/test-data/branch-from-empty-dir-cvsrepos/proj/subdir/Attic/b.txt,v --- cvs2svn-1.5.x/test-data/branch-from-empty-dir-cvsrepos/proj/subdir/Attic/b.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/branch-from-empty-dir-cvsrepos/proj/subdir/Attic/b.txt,v 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,40 @@ +head 1.2; +access; +symbols + BRANCH2:1.2.0.4 + BRANCH1:1.2.0.2; +locks; strict; +comment @# @; + + +1.2 +date 2007.04.30.15.20.32; author mhagger; state dead; +branches; +next 1.1; + +1.1 +date 2007.04.30.15.20.31; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@Removing subdir/b.txt +@ +text +@1.1 +@ + + +1.1 +log +@Adding subdir/b.txt:1.1 +@ +text +@@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/crossed-branches-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/crossed-branches-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/crossed-branches-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/crossed-branches-cvsrepos/CVSROOT/README 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/crossed-branches-cvsrepos/proj/file1.txt,v cvs2svn-2.0.0/test-data/crossed-branches-cvsrepos/proj/file1.txt,v --- cvs2svn-1.5.x/test-data/crossed-branches-cvsrepos/proj/file1.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/crossed-branches-cvsrepos/proj/file1.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,77 @@ +head 1.2; +access; +symbols + BRANCH4:1.2.0.4 + BRANCH3:1.2.0.2 + BRANCH2:1.1.0.4 + BRANCH1:1.1.0.2; +locks; strict; +comment @# @; + + +1.2 +date 2007.01.20.14.54.27; author mhagger; state Exp; +branches + 1.2.2.1 + 1.2.4.1; +next 1.1; + +1.1 +date 2007.01.20.14.45.02; author mhagger; state Exp; +branches; +next ; + +1.2.2.1 +date 2007.01.20.14.56.46; author mhagger; state Exp; +branches; +next ; + +1.2.4.1 +date 2007.01.20.14.57.11; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@Revisions 1.2 +@ +text +@1.2 +@ + + +1.2.4.1 +log +@Shared commit message +@ +text +@d1 1 +a1 1 +BRANCH4 commit +@ + + +1.2.2.1 +log +@Shared commit message +@ +text +@d1 1 +a1 1 +BRANCH3 commit +@ + + +1.1 +log +@Adding two files +@ +text +@d1 1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/crossed-branches-cvsrepos/proj/file2.txt,v cvs2svn-2.0.0/test-data/crossed-branches-cvsrepos/proj/file2.txt,v --- cvs2svn-1.5.x/test-data/crossed-branches-cvsrepos/proj/file2.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/crossed-branches-cvsrepos/proj/file2.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,77 @@ +head 1.2; +access; +symbols + BRANCH3:1.2.0.4 + BRANCH4:1.2.0.2 + BRANCH1:1.1.0.4 + BRANCH2:1.1.0.2; +locks; strict; +comment @# @; + + +1.2 +date 2007.01.20.14.54.27; author mhagger; state Exp; +branches + 1.2.2.1 + 1.2.4.1; +next 1.1; + +1.1 +date 2007.01.20.14.45.02; author mhagger; state Exp; +branches; +next ; + +1.2.2.1 +date 2007.01.20.14.57.11; author mhagger; state Exp; +branches; +next ; + +1.2.4.1 +date 2007.01.20.14.56.46; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@Revisions 1.2 +@ +text +@1.2 +@ + + +1.2.2.1 +log +@Shared commit message +@ +text +@d1 1 +a1 1 +BRANCH4 commit +@ + + +1.2.4.1 +log +@Shared commit message +@ +text +@d1 1 +a1 1 +BRANCH3 commit +@ + + +1.1 +log +@Adding two files +@ +text +@d1 1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/default-branch-and-1-2-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/default-branch-and-1-2-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/default-branch-and-1-2-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/default-branch-and-1-2-cvsrepos/CVSROOT/README 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/default-branch-and-1-2-cvsrepos/proj/a.txt,v cvs2svn-2.0.0/test-data/default-branch-and-1-2-cvsrepos/proj/a.txt,v --- cvs2svn-1.5.x/test-data/default-branch-and-1-2-cvsrepos/proj/a.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/default-branch-and-1-2-cvsrepos/proj/a.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,111 @@ +head 1.2; +branch 1.1.1; +access; +symbols + vtag-4:1.1.1.4 + vtag-3:1.1.1.3 + vtag-2:1.1.1.2 + vtag-1:1.1.1.1 + vbranchA:1.1.1; +locks; strict; +comment @# @; + + +1.2 +date 2004.02.09.15.43.14; author kfogel; state Exp; +branches; +next 1.1; + +1.1 +date 2004.02.09.15.43.13; author kfogel; state Exp; +branches + 1.1.1.1; +next ; + +1.1.1.1 +date 2004.02.09.15.43.13; author kfogel; state Exp; +branches; +next 1.1.1.2; + +1.1.1.2 +date 2004.02.09.15.43.13; author kfogel; state Exp; +branches; +next 1.1.1.3; + +1.1.1.3 +date 2004.02.09.15.43.13; author kfogel; state Exp; +branches; +next 1.1.1.4; + +1.1.1.4 +date 2004.02.09.15.43.16; author kfogel; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@First regular commit, to a.txt, on vtag-3. +@ +text +@This is vtag-3 (on vbranchA) of a.txt. +A regular change to a.txt. +@ + + +1.1 +log +@Initial revision +@ +text +@d1 2 +a2 1 +This is vtag-1 (on vbranchA) of a.txt. +@ + + +1.1.1.1 +log +@Import (vbranchA, vtag-1). +@ +text +@@ + + +1.1.1.2 +log +@Import (vbranchA, vtag-2). +@ +text +@d1 1 +a1 1 +This is vtag-2 (on vbranchA) of a.txt. +@ + + +1.1.1.3 +log +@Import (vbranchA, vtag-3). +@ +text +@d1 1 +a1 1 +This is vtag-3 (on vbranchA) of a.txt. +@ + + +1.1.1.4 +log +@Import (vbranchA, vtag-4). +@ +text +@d1 1 +a1 1 +This is vtag-4 (on vbranchA) of a.txt. +@ + + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/delete-cvsignore-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/delete-cvsignore-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/delete-cvsignore-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/delete-cvsignore-cvsrepos/CVSROOT/README 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/delete-cvsignore-cvsrepos/proj/Attic/.cvsignore,v cvs2svn-2.0.0/test-data/delete-cvsignore-cvsrepos/proj/Attic/.cvsignore,v --- cvs2svn-1.5.x/test-data/delete-cvsignore-cvsrepos/proj/Attic/.cvsignore,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/delete-cvsignore-cvsrepos/proj/Attic/.cvsignore,v 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,38 @@ +head 1.2; +access; +symbols; +locks; strict; +comment @# @; + + +1.2 +date 2006.10.15.13.13.20; author mhagger; state dead; +branches; +next 1.1; + +1.1 +date 2006.10.15.13.12.43; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@Remove .cvsignore +@ +text +@*.o +@ + + +1.1 +log +@Add random file and .cvsignore +@ +text +@@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/delete-cvsignore-cvsrepos/proj/file.txt,v cvs2svn-2.0.0/test-data/delete-cvsignore-cvsrepos/proj/file.txt,v --- cvs2svn-1.5.x/test-data/delete-cvsignore-cvsrepos/proj/file.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/delete-cvsignore-cvsrepos/proj/file.txt,v 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,23 @@ +head 1.1; +access; +symbols; +locks; strict; +comment @# @; + + +1.1 +date 2006.10.15.13.12.43; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@Add random file and .cvsignore +@ +text +@@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/double-add-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/double-add-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/double-add-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/double-add-cvsrepos/CVSROOT/README 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/double-branch-delete-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/double-branch-delete-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/double-branch-delete-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/double-branch-delete-cvsrepos/CVSROOT/README 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/double-fill-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/double-fill-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/double-fill-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/double-fill-cvsrepos/CVSROOT/README 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/double-fill2-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/double-fill2-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/double-fill2-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/double-fill2-cvsrepos/CVSROOT/README 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/double-fill2-cvsrepos/proj/a.txt,v cvs2svn-2.0.0/test-data/double-fill2-cvsrepos/proj/a.txt,v --- cvs2svn-1.5.x/test-data/double-fill2-cvsrepos/proj/a.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/double-fill2-cvsrepos/proj/a.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,74 @@ +head 1.3; +access; +symbols + BRANCH2:1.2.0.4 + BRANCH1:1.2.0.2; +locks; strict; +comment @# @; + + +1.3 +date 2005.03.23.16.45.45; author author2; state Exp; +branches; +next 1.2; + +1.2 +date 2004.02.27.22.01.32; author author3; state Exp; +branches + 1.2.2.1 + 1.2.4.1; +next 1.1; + +1.1 +date 2003.12.22.20.09.23; author author4; state Exp; +branches; +next ; + +1.2.2.1 +date 2005.05.05.04.07.24; author author2; state Exp; +branches; +next ; + +1.2.4.1 +date 2005.05.05.20.56.37; author author6; state Exp; +branches; +next ; + + +desc +@@ + + +1.3 +log +@log 5@ +text +@@ + + +1.2 +log +@log 6@ +text +@@ + + +1.2.4.1 +log +@log 7@ +text +@@ + + +1.2.2.1 +log +@log 9@ +text +@@ + + +1.1 +log +@log 10@ +text +@@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/double-fill2-cvsrepos/proj/b.txt,v cvs2svn-2.0.0/test-data/double-fill2-cvsrepos/proj/b.txt,v --- cvs2svn-1.5.x/test-data/double-fill2-cvsrepos/proj/b.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/double-fill2-cvsrepos/proj/b.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,62 @@ +head 1.2; +access; +symbols + BRANCH2:1.1.0.4 + BRANCH1:1.1.0.2; +locks; strict; +comment @// @; + + +1.2 +date 2005.04.05.11.08.53; author author7; state Exp; +branches; +next 1.1; + +1.1 +date 2005.02.22.00.10.19; author author7; state Exp; +branches + 1.1.2.1 + 1.1.4.1; +next ; + +1.1.2.1 +date 2005.04.05.11.09.38; author author7; state Exp; +branches; +next ; + +1.1.4.1 +date 2005.04.05.22.20.39; author author6; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@log 13@ +text +@@ + + +1.1 +log +@log 14@ +text +@@ + + +1.1.4.1 +log +@log 15@ +text +@@ + + +1.1.2.1 +log +@log 13@ +text +@@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/double-fill2-cvsrepos/proj/c.txt,v cvs2svn-2.0.0/test-data/double-fill2-cvsrepos/proj/c.txt,v --- cvs2svn-1.5.x/test-data/double-fill2-cvsrepos/proj/c.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/double-fill2-cvsrepos/proj/c.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,50 @@ +head 1.1; +access; +symbols + BRANCH2:1.1.0.8 + BRANCH1:1.1.0.2; +locks; strict; +comment @// @; + + +1.1 +date 2005.10.03.17.35.55; author author8; state Exp; +branches + 1.1.2.1 + 1.1.8.1; +next ; + +1.1.2.1 +date 2005.10.04.10.54.11; author author8; state Exp; +branches; +next ; + +1.1.8.1 +date 2005.10.05.20.46.30; author author6; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@log 17@ +text +@@ + + +1.1.8.1 +log +@log 18@ +text +@@ + + +1.1.2.1 +log +@log 19@ +text +@@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/double-fill2-cvsrepos/proj/d.txt,v cvs2svn-2.0.0/test-data/double-fill2-cvsrepos/proj/d.txt,v --- cvs2svn-1.5.x/test-data/double-fill2-cvsrepos/proj/d.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/double-fill2-cvsrepos/proj/d.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,49 @@ +head 1.2; +access; +symbols + BRANCH2:1.1.2.1.0.2 + BRANCH1:1.1.0.2; +locks; strict; +comment @// @; + + +1.2 +date 2005.08.17.02.27.39; author author1; state Exp; +branches; +next 1.1; + +1.1 +date 2005.02.25.18.17.06; author author8; state Exp; +branches + 1.1.2.1; +next ; + +1.1.2.1 +date 2005.02.25.23.10.11; author author8; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@log 1@ +text +@@ + + +1.1 +log +@log 16@ +text +@@ + + +1.1.2.1 +log +@log 16@ +text +@@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/empty-trunk-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/empty-trunk-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/empty-trunk-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/empty-trunk-cvsrepos/CVSROOT/README 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/eol-mime-cvsrepos/README cvs2svn-2.0.0/test-data/eol-mime-cvsrepos/README --- cvs2svn-1.5.x/test-data/eol-mime-cvsrepos/README 2004-07-27 00:12:17.000000000 +0200 +++ cvs2svn-2.0.0/test-data/eol-mime-cvsrepos/README 2007-08-15 22:53:51.000000000 +0200 @@ -6,43 +6,3 @@ how these interactions work. The 'mime-mappings.txt' file for this repository is stored right here, in the repository itself. It won't be converted because it doesn't end with ,v. - -The table below says what to expect for the data here. We assume that ---mime-types is passed every time; the only question is whether -none/one/both of --no-default-eol and --eol-from-mime-type are passed: - - foo.txt (no -kb, mime file says nothing): - --NO flags: native eol, no mime type - --no-default-eol: no eol, no mime type - --eol-from-mime-type: native eol, no mime type - --BOTH flags: no eol, no mime type - - foo.xml (no -kb, mime file says text): - --NO flags: native eol, mime type "text/blah" - --no-default-eol: no eol, mime type "text/blah" - --eol-from-mime-type: native eol, mime type "text/blah" - --BOTH flags: native eol, mime type "text/blah" - - foo.zip (no -kb, mime file says non-text): - --NO flags: native eol, mime type "blah/blah" - --no-default-eol: no eol, mime type "blah/blah" - --eol-from-mime-type: native eol, mime type "blah/blah" - --BOTH flags: no eol, mime type "blah/blah" - - foo.bin (has -kb, mime file says nothing): - --NO flags: no eol, mime type "application/octet-stream" - --no-default-eol: no eol, mime type "application/octet-stream" - --eol-from-mime-type: no eol, mime type "application/octet-stream" - --BOTH flags: no eol, mime type "application/octet-stream" - - foo.csv (has -kb, mime file says text): - --NO flags: no eol, mime type "text/blah" - --no-default-eol: no eol, mime type "text/blah" - --eol-from-mime-type: no eol, mime type "text/blah" - --BOTH flags: no eol, mime type "text/blah" - - foo.dbf (has -kb, mime file says non-text): - --NO flags: no eol, mime type "blah/blah" - --no-default-eol: no eol, mime type "blah/blah" - --eol-from-mime-type: no eol, mime type "blah/blah" - --BOTH flags: no eol, mime type "blah/blah" diff -purNbBwx .svn cvs2svn-1.5.x/test-data/file-directory-conflict-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/file-directory-conflict-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/file-directory-conflict-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/file-directory-conflict-cvsrepos/CVSROOT/README 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/file-directory-conflict-cvsrepos/proj/name/name2,v cvs2svn-2.0.0/test-data/file-directory-conflict-cvsrepos/proj/name/name2,v --- cvs2svn-1.5.x/test-data/file-directory-conflict-cvsrepos/proj/name/name2,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/file-directory-conflict-cvsrepos/proj/name/name2,v 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,23 @@ +head 1.1; +access; +symbols; +locks; strict; +comment @# @; + + +1.1 +date 2007.03.26.12.00.00; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@Adding file "name2". +@ +text +@@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/file-directory-conflict-cvsrepos/proj/name,v cvs2svn-2.0.0/test-data/file-directory-conflict-cvsrepos/proj/name,v --- cvs2svn-1.5.x/test-data/file-directory-conflict-cvsrepos/proj/name,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/file-directory-conflict-cvsrepos/proj/name,v 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,23 @@ +head 1.1; +access; +symbols; +locks; strict; +comment @# @; + + +1.1 +date 2007.03.26.13.00.00; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@Adding file "name". +@ +text +@@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/file-in-attic-too-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/file-in-attic-too-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/file-in-attic-too-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/file-in-attic-too-cvsrepos/CVSROOT/README 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/internal-co-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/internal-co-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/internal-co-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/internal-co-cvsrepos/CVSROOT/README 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/internal-co-cvsrepos/branched/Attic/somefile.txt,v cvs2svn-2.0.0/test-data/internal-co-cvsrepos/branched/Attic/somefile.txt,v --- cvs2svn-1.5.x/test-data/internal-co-cvsrepos/branched/Attic/somefile.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/internal-co-cvsrepos/branched/Attic/somefile.txt,v 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,181 @@ +head 1.5; +access; +symbols + BRANCH_FROM_DEAD:1.5.0.2 + BRANCH:1.1.0.2; +locks; strict; +comment @# @; + + +1.5 +date 2007.04.05.15.32.44; author ossi; state dead; +branches + 1.5.2.1; +next 1.4; +commitid ksTEPgcwRGzBKTcs; + +1.4 +date 2007.04.05.15.32.23; author ossi; state Exp; +branches; +next 1.3; +commitid g00K7nhwfWduKTcs; + +1.3 +date 2007.04.05.15.13.23; author ossi; state dead; +branches; +next 1.2; +commitid 6vDsezBCNsbYDTcs; + +1.2 +date 2007.04.05.15.13.08; author ossi; state Exp; +branches; +next 1.1; +commitid XPB93k3c4jJSDTcs; + +1.1 +date 2007.04.05.15.07.41; author ossi; state Exp; +branches + 1.1.2.1; +next ; +commitid jgg4E7IfvqX0CTcs; + +1.5.2.1 +date 2007.04.05.15.32.44; author ossi; state dead; +branches; +next 1.5.2.2; +commitid bGYbKyNPiicdLTcs; + +1.5.2.2 +date 2007.04.05.15.34.30; author ossi; state Exp; +branches; +next ; +commitid bGYbKyNPiicdLTcs; + +1.1.2.1 +date 2007.04.05.15.30.02; author ossi; state Exp; +branches; +next 1.1.2.2; +commitid 6MkfqH2xRPLDJTcs; + +1.1.2.2 +date 2007.04.05.15.30.44; author ossi; state Exp; +branches; +next 1.1.2.3; +commitid BSF6Cvx3cHWUJTcs; + +1.1.2.3 +date 2007.04.05.15.30.55; author ossi; state dead; +branches; +next ; +commitid UTPOIUtFBh3ZJTcs; + + +desc +@@ + + +1.5 +log +@file re-deleted on trunk +@ +text +@resurrected file content. +@ + + +1.5.2.1 +log +@file somefile.txt was added on branch BRANCH_FROM_DEAD on 2007-04-05 15:34:30 +0000 +@ +text +@d1 1 +@ + + +1.5.2.2 +log +@file revived on branch +@ +text +@a0 1 +text on branch spawning from dead revision.@ + + +1.4 +log +@file resurrected on trunk +@ +text +@@ + + +1.3 +log +@file deleted +@ +text +@d1 1 +a1 2 +keyword: $Id: somefile.txt,v 1.2 2007-04-05 15:13:08 ossi Exp $ now done +this is modified file content. +@ + + +1.2 +log +@file modified +@ +text +@d1 1 +a1 1 +keyword: $Id: somefile.txt,v 1.1 2007-04-05 15:07:41 ossi Exp $ now done +@ + + +1.1 +log +@file added +@ +text +@d1 2 +a2 2 +this is file content. +keyword: $Id: fake expaded keyword$ now done +@ + + +1.1.2.1 +log +@modified on branch +@ +text +@d2 1 +a2 2 +text added on branch. +keyword: $Id: somefile.txt,v 1.1 2007-04-05 15:07:41 ossi Exp $ now done +@ + + +1.1.2.2 +log +@file modified on branch, take 2 +@ +text +@d3 1 +a3 2 +keyword: $Id: somefile.txt,v 1.1.2.1 2007-04-05 15:30:02 ossi Exp $ now done +more text added on branch. +@ + + +1.1.2.3 +log +@file deleted on branch +@ +text +@d3 1 +a3 1 +keyword: $Id: somefile.txt,v 1.1.2.2 2007-04-05 15:30:44 ossi Exp $ now done +@ + + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/issue-100-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/issue-100-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/issue-100-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/issue-100-cvsrepos/CVSROOT/README 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/issue-106-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/issue-106-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/issue-106-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/issue-106-cvsrepos/CVSROOT/README 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/issue-99-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/issue-99-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/issue-99-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/issue-99-cvsrepos/CVSROOT/README 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/leftover-revs-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/leftover-revs-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/leftover-revs-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/leftover-revs-cvsrepos/CVSROOT/README 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/leftover-revs-cvsrepos/file.txt,v cvs2svn-2.0.0/test-data/leftover-revs-cvsrepos/file.txt,v --- cvs2svn-1.5.x/test-data/leftover-revs-cvsrepos/file.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/leftover-revs-cvsrepos/file.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,60 @@ +head 1.2; +access; +symbols + BRANCH:1.2.0.2; +locks; strict; +comment @// @; + + +1.2 +date 2003.07.04.16.36.26; author author4; state dead; +branches + 1.2.2.1; +next 1.1; + +1.1 +date 2003.07.04.16.13.47; author author4; state Exp; +branches; +next ; + +1.2.2.1 +date 2003.07.13.07.12.30; author author15; state Exp; +branches; +next 1.2.2.2; + +1.2.2.2 +date 2004.04.09.01.57.02; author author15; state dead; +branches; +next ; + + +desc +@@ + + +1.2 +log +@log 1670@ +text +@@ + + +1.2.2.1 +log +@log 1950@ +text +@@ + + +1.2.2.2 +log +@log 11@ +text +@@ + + +1.1 +log +@log 1671@ +text +@@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/main-cvsrepos/cvs2svn.options cvs2svn-2.0.0/test-data/main-cvsrepos/cvs2svn.options --- cvs2svn-1.5.x/test-data/main-cvsrepos/cvs2svn.options 2006-09-03 23:10:57.000000000 +0200 +++ cvs2svn-2.0.0/test-data/main-cvsrepos/cvs2svn.options 2007-08-15 22:53:51.000000000 +0200 @@ -8,6 +8,6 @@ execfile('cvs2svn-example.options') ctx.output_option = NewRepositoryOutputOption( - 'tmp/main--options=cvs2svn.options-svnrepos', + 'cvs2svn-tmp/main--options=cvs2svn.options-svnrepos', ) diff -purNbBwx .svn cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/AB-double-passthru-loop/a.txt,v cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/AB-double-passthru-loop/a.txt,v --- cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/AB-double-passthru-loop/a.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/AB-double-passthru-loop/a.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,73 @@ +head 1.4; +access; +symbols; +locks; strict; +comment @# @; + + +1.4 +date 2006.10.30.22.11.43; author mhagger; state Exp; +branches; +next 1.3; + +1.3 +date 2006.10.30.22.11.42; author mhagger; state Exp; +branches; +next 1.2; + +1.2 +date 2006.10.30.22.11.41; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2006.10.30.22.11.40; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.4 +log +@AB-double-passthru-loop-D +@ +text +@1.4 +@ + + +1.3 +log +@AB-double-passthru-loop-C +@ +text +@d1 1 +a1 1 +1.3 +@ + + +1.2 +log +@AB-double-passthru-loop-B +@ +text +@d1 1 +a1 1 +1.2 +@ + + +1.1 +log +@AB-double-passthru-loop-A +@ +text +@d1 1 +a1 1 +1.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/AB-double-passthru-loop/b.txt,v cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/AB-double-passthru-loop/b.txt,v --- cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/AB-double-passthru-loop/b.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/AB-double-passthru-loop/b.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,73 @@ +head 1.4; +access; +symbols; +locks; strict; +comment @# @; + + +1.4 +date 2006.10.30.22.11.47; author mhagger; state Exp; +branches; +next 1.3; + +1.3 +date 2006.10.30.22.11.46; author mhagger; state Exp; +branches; +next 1.2; + +1.2 +date 2006.10.30.22.11.45; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2006.10.30.22.11.44; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.4 +log +@AB-double-passthru-loop-B +@ +text +@1.4 +@ + + +1.3 +log +@AB-double-passthru-loop-A +@ +text +@d1 1 +a1 1 +1.3 +@ + + +1.2 +log +@AB-double-passthru-loop-D +@ +text +@d1 1 +a1 1 +1.2 +@ + + +1.1 +log +@AB-double-passthru-loop-C +@ +text +@d1 1 +a1 1 +1.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/AB-loop/a.txt,v cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/AB-loop/a.txt,v --- cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/AB-loop/a.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/AB-loop/a.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,41 @@ +head 1.2; +access; +symbols; +locks; strict; +comment @# @; + + +1.2 +date 2006.10.30.22.11.10; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2006.10.30.22.11.09; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@AB-loop-B +@ +text +@1.2 +@ + + +1.1 +log +@AB-loop-A +@ +text +@d1 1 +a1 1 +1.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/AB-loop/b.txt,v cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/AB-loop/b.txt,v --- cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/AB-loop/b.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/AB-loop/b.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,41 @@ +head 1.2; +access; +symbols; +locks; strict; +comment @# @; + + +1.2 +date 2006.10.30.22.11.12; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2006.10.30.22.11.11; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@AB-loop-A +@ +text +@1.2 +@ + + +1.1 +log +@AB-loop-B +@ +text +@d1 1 +a1 1 +1.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABC-loop/a.txt,v cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABC-loop/a.txt,v --- cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABC-loop/a.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABC-loop/a.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,41 @@ +head 1.2; +access; +symbols; +locks; strict; +comment @# @; + + +1.2 +date 2006.10.30.22.11.14; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2006.10.30.22.11.13; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@ABC-loop-B +@ +text +@1.2 +@ + + +1.1 +log +@ABC-loop-A +@ +text +@d1 1 +a1 1 +1.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABC-loop/b.txt,v cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABC-loop/b.txt,v --- cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABC-loop/b.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABC-loop/b.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,41 @@ +head 1.2; +access; +symbols; +locks; strict; +comment @# @; + + +1.2 +date 2006.10.30.22.11.16; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2006.10.30.22.11.15; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@ABC-loop-C +@ +text +@1.2 +@ + + +1.1 +log +@ABC-loop-B +@ +text +@d1 1 +a1 1 +1.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABC-loop/c.txt,v cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABC-loop/c.txt,v --- cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABC-loop/c.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABC-loop/c.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,41 @@ +head 1.2; +access; +symbols; +locks; strict; +comment @# @; + + +1.2 +date 2006.10.30.22.11.18; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2006.10.30.22.11.17; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@ABC-loop-A +@ +text +@1.2 +@ + + +1.1 +log +@ABC-loop-C +@ +text +@d1 1 +a1 1 +1.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABC-passthru-loop/a.txt,v cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABC-passthru-loop/a.txt,v --- cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABC-passthru-loop/a.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABC-passthru-loop/a.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,57 @@ +head 1.3; +access; +symbols; +locks; strict; +comment @# @; + + +1.3 +date 2006.10.30.22.11.21; author mhagger; state Exp; +branches; +next 1.2; + +1.2 +date 2006.10.30.22.11.20; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2006.10.30.22.11.19; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.3 +log +@ABC-passthru-loop-C +@ +text +@1.3 +@ + + +1.2 +log +@ABC-passthru-loop-B +@ +text +@d1 1 +a1 1 +1.2 +@ + + +1.1 +log +@ABC-passthru-loop-A +@ +text +@d1 1 +a1 1 +1.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABC-passthru-loop/b.txt,v cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABC-passthru-loop/b.txt,v --- cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABC-passthru-loop/b.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABC-passthru-loop/b.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,57 @@ +head 1.3; +access; +symbols; +locks; strict; +comment @# @; + + +1.3 +date 2006.10.30.22.11.24; author mhagger; state Exp; +branches; +next 1.2; + +1.2 +date 2006.10.30.22.11.23; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2006.10.30.22.11.22; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.3 +log +@ABC-passthru-loop-A +@ +text +@1.3 +@ + + +1.2 +log +@ABC-passthru-loop-C +@ +text +@d1 1 +a1 1 +1.2 +@ + + +1.1 +log +@ABC-passthru-loop-B +@ +text +@d1 1 +a1 1 +1.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABC-passthru-loop/c.txt,v cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABC-passthru-loop/c.txt,v --- cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABC-passthru-loop/c.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABC-passthru-loop/c.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,57 @@ +head 1.3; +access; +symbols; +locks; strict; +comment @# @; + + +1.3 +date 2006.10.30.22.11.27; author mhagger; state Exp; +branches; +next 1.2; + +1.2 +date 2006.10.30.22.11.26; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2006.10.30.22.11.25; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.3 +log +@ABC-passthru-loop-B +@ +text +@1.3 +@ + + +1.2 +log +@ABC-passthru-loop-A +@ +text +@d1 1 +a1 1 +1.2 +@ + + +1.1 +log +@ABC-passthru-loop-C +@ +text +@d1 1 +a1 1 +1.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABCD-passthru-loop/a.txt,v cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABCD-passthru-loop/a.txt,v --- cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABCD-passthru-loop/a.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABCD-passthru-loop/a.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,57 @@ +head 1.3; +access; +symbols; +locks; strict; +comment @# @; + + +1.3 +date 2006.10.30.22.11.30; author mhagger; state Exp; +branches; +next 1.2; + +1.2 +date 2006.10.30.22.11.29; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2006.10.30.22.11.28; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.3 +log +@ABCD-passthru-loop-C +@ +text +@1.3 +@ + + +1.2 +log +@ABCD-passthru-loop-B +@ +text +@d1 1 +a1 1 +1.2 +@ + + +1.1 +log +@ABCD-passthru-loop-A +@ +text +@d1 1 +a1 1 +1.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABCD-passthru-loop/b.txt,v cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABCD-passthru-loop/b.txt,v --- cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABCD-passthru-loop/b.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABCD-passthru-loop/b.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,57 @@ +head 1.3; +access; +symbols; +locks; strict; +comment @# @; + + +1.3 +date 2006.10.30.22.11.33; author mhagger; state Exp; +branches; +next 1.2; + +1.2 +date 2006.10.30.22.11.32; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2006.10.30.22.11.31; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.3 +log +@ABCD-passthru-loop-D +@ +text +@1.3 +@ + + +1.2 +log +@ABCD-passthru-loop-C +@ +text +@d1 1 +a1 1 +1.2 +@ + + +1.1 +log +@ABCD-passthru-loop-B +@ +text +@d1 1 +a1 1 +1.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABCD-passthru-loop/c.txt,v cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABCD-passthru-loop/c.txt,v --- cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABCD-passthru-loop/c.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABCD-passthru-loop/c.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,57 @@ +head 1.3; +access; +symbols; +locks; strict; +comment @# @; + + +1.3 +date 2006.10.30.22.11.36; author mhagger; state Exp; +branches; +next 1.2; + +1.2 +date 2006.10.30.22.11.35; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2006.10.30.22.11.34; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.3 +log +@ABCD-passthru-loop-A +@ +text +@1.3 +@ + + +1.2 +log +@ABCD-passthru-loop-D +@ +text +@d1 1 +a1 1 +1.2 +@ + + +1.1 +log +@ABCD-passthru-loop-C +@ +text +@d1 1 +a1 1 +1.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABCD-passthru-loop/d.txt,v cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABCD-passthru-loop/d.txt,v --- cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/ABCD-passthru-loop/d.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/ABCD-passthru-loop/d.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,57 @@ +head 1.3; +access; +symbols; +locks; strict; +comment @# @; + + +1.3 +date 2006.10.30.22.11.39; author mhagger; state Exp; +branches; +next 1.2; + +1.2 +date 2006.10.30.22.11.38; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2006.10.30.22.11.37; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.3 +log +@ABCD-passthru-loop-B +@ +text +@1.3 +@ + + +1.2 +log +@ABCD-passthru-loop-A +@ +text +@d1 1 +a1 1 +1.2 +@ + + +1.1 +log +@ABCD-passthru-loop-D +@ +text +@d1 1 +a1 1 +1.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/CVSROOT/README 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/make-nasty-graphs.sh cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/make-nasty-graphs.sh --- cvs2svn-1.5.x/test-data/nasty-graphs-cvsrepos/make-nasty-graphs.sh 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/nasty-graphs-cvsrepos/make-nasty-graphs.sh 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,216 @@ +#! /bin/sh + +# This script can be moved to the test-data directory and executed +# there to recreate nasty-graphs-cvsrepos. (Well, approximately. It +# doesn't clean up CVSROOT or add CVSROOT/README.) + +CVSROOT=`pwd`/nasty-graphs-cvsrepos +export CVSROOT +rm -rf $CVSROOT + +WC=`pwd`/cvs2svn-tmp +rm -rf $WC + +cvs init + +cvs co -d $WC . + + +# +-> A -> B --+ +# | | +# +------------+ +# +# A: a.txt<1.1> b.txt<1.2> +# B: a.txt<1.2> b.txt<1.1> + +TEST=AB-loop +D=$WC/$TEST + +mkdir $D +cvs add $D + +echo "1.1" >$D/a.txt +cvs add $D/a.txt +cvs commit -m "$TEST-A" $D/a.txt + +echo "1.2" >$D/a.txt +cvs commit -m "$TEST-B" $D/a.txt + +echo "1.1" >$D/b.txt +cvs add $D/b.txt +cvs commit -m "$TEST-B" $D/b.txt + +echo "1.2" >$D/b.txt +cvs commit -m "$TEST-A" $D/b.txt + + +# +-> A -> B -> C --+ +# | | +# +-----------------+ +# +# A: a.txt<1.1> c.txt<1.2> +# B: a.txt<1.2> b.txt<1.1> +# C: b.txt<1.2> c.txt<1.1> + +TEST=ABC-loop +D=$WC/$TEST + +mkdir $D +cvs add $D + +echo "1.1" >$D/a.txt +cvs add $D/a.txt +cvs commit -m "$TEST-A" $D/a.txt + +echo "1.2" >$D/a.txt +cvs commit -m "$TEST-B" $D/a.txt + +echo "1.1" >$D/b.txt +cvs add $D/b.txt +cvs commit -m "$TEST-B" $D/b.txt + +echo "1.2" >$D/b.txt +cvs commit -m "$TEST-C" $D/b.txt + +echo "1.1" >$D/c.txt +cvs add $D/c.txt +cvs commit -m "$TEST-C" $D/c.txt + +echo "1.2" >$D/c.txt +cvs commit -m "$TEST-A" $D/c.txt + + +# A: a.txt<1.1> b.txt<1.3> c.txt<1.2> +# B: a.txt<1.2> b.txt<1.1> c.txt<1.3> +# C: a.txt<1.3> b.txt<1.2> c.txt<1.1> + +TEST=ABC-passthru-loop +D=$WC/$TEST + +mkdir $D +cvs add $D + +echo "1.1" >$D/a.txt +cvs add $D/a.txt +cvs commit -m "$TEST-A" $D/a.txt + +echo "1.2" >$D/a.txt +cvs commit -m "$TEST-B" $D/a.txt + +echo "1.3" >$D/a.txt +cvs commit -m "$TEST-C" $D/a.txt + +echo "1.1" >$D/b.txt +cvs add $D/b.txt +cvs commit -m "$TEST-B" $D/b.txt + +echo "1.2" >$D/b.txt +cvs commit -m "$TEST-C" $D/b.txt + +echo "1.3" >$D/b.txt +cvs commit -m "$TEST-A" $D/b.txt + +echo "1.1" >$D/c.txt +cvs add $D/c.txt +cvs commit -m "$TEST-C" $D/c.txt + +echo "1.2" >$D/c.txt +cvs commit -m "$TEST-A" $D/c.txt + +echo "1.3" >$D/c.txt +cvs commit -m "$TEST-B" $D/c.txt + + +# A: a.txt<1.1> c.txt<1.3> d.txt<1.2> +# B: a.txt<1.2> b.txt<1.1> d.txt<1.3> +# C: a.txt<1.3> b.txt<1.2> c.txt<1.1> +# D: b.txt<1.3> c.txt<1.2> d.txt<1.1> + +TEST=ABCD-passthru-loop +D=$WC/$TEST + +mkdir $D +cvs add $D + +echo "1.1" >$D/a.txt +cvs add $D/a.txt +cvs commit -m "$TEST-A" $D/a.txt + +echo "1.2" >$D/a.txt +cvs commit -m "$TEST-B" $D/a.txt + +echo "1.3" >$D/a.txt +cvs commit -m "$TEST-C" $D/a.txt + +echo "1.1" >$D/b.txt +cvs add $D/b.txt +cvs commit -m "$TEST-B" $D/b.txt + +echo "1.2" >$D/b.txt +cvs commit -m "$TEST-C" $D/b.txt + +echo "1.3" >$D/b.txt +cvs commit -m "$TEST-D" $D/b.txt + +echo "1.1" >$D/c.txt +cvs add $D/c.txt +cvs commit -m "$TEST-C" $D/c.txt + +echo "1.2" >$D/c.txt +cvs commit -m "$TEST-D" $D/c.txt + +echo "1.3" >$D/c.txt +cvs commit -m "$TEST-A" $D/c.txt + +echo "1.1" >$D/d.txt +cvs add $D/d.txt +cvs commit -m "$TEST-D" $D/d.txt + +echo "1.2" >$D/d.txt +cvs commit -m "$TEST-A" $D/d.txt + +echo "1.3" >$D/d.txt +cvs commit -m "$TEST-B" $D/d.txt + + +# The following test has the nasty property that each changeset has +# either one LINK_PREV or LINK_SUCC and also one LINK_PASSTHRU. +# +# A: a.txt<1.1> b.txt<1.3> +# B: a.txt<1.2> b.txt<1.4> +# C: a.txt<1.3> b.txt<1.1> +# D: a.txt<1.4> b.txt<1.2> + +TEST=AB-double-passthru-loop +D=$WC/$TEST + +mkdir $D +cvs add $D + +echo "1.1" >$D/a.txt +cvs add $D/a.txt +cvs commit -m "$TEST-A" $D/a.txt + +echo "1.2" >$D/a.txt +cvs commit -m "$TEST-B" $D/a.txt + +echo "1.3" >$D/a.txt +cvs commit -m "$TEST-C" $D/a.txt + +echo "1.4" >$D/a.txt +cvs commit -m "$TEST-D" $D/a.txt + +echo "1.1" >$D/b.txt +cvs add $D/b.txt +cvs commit -m "$TEST-C" $D/b.txt + +echo "1.2" >$D/b.txt +cvs commit -m "$TEST-D" $D/b.txt + +echo "1.3" >$D/b.txt +cvs commit -m "$TEST-A" $D/b.txt + +echo "1.4" >$D/b.txt +cvs commit -m "$TEST-B" $D/b.txt + + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/native-eol-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/native-eol-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/native-eol-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/native-eol-cvsrepos/CVSROOT/README 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/overlapping-branch-cvsrepos/nonoverlapping-branch,v cvs2svn-2.0.0/test-data/overlapping-branch-cvsrepos/nonoverlapping-branch,v --- cvs2svn-1.5.x/test-data/overlapping-branch-cvsrepos/nonoverlapping-branch,v 2004-01-14 06:46:05.000000000 +0100 +++ cvs2svn-2.0.0/test-data/overlapping-branch-cvsrepos/nonoverlapping-branch,v 2007-08-15 22:53:49.000000000 +0200 @@ -25,11 +25,11 @@ desc 1.1 log -@The content of this file is unimportant, what matters is that it -has one branch. +@Initial revision @ text -@Nothing to see here. +@The content of this file is unimportant, what matters is that it +has one branch. @ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/overlapping-branch-cvsrepos/overlapping-branch,v cvs2svn-2.0.0/test-data/overlapping-branch-cvsrepos/overlapping-branch,v --- cvs2svn-1.5.x/test-data/overlapping-branch-cvsrepos/overlapping-branch,v 2004-01-20 00:15:59.000000000 +0100 +++ cvs2svn-2.0.0/test-data/overlapping-branch-cvsrepos/overlapping-branch,v 2007-08-15 22:53:49.000000000 +0200 @@ -26,13 +26,13 @@ desc 1.1 log +@Initial revision +@ +text @The content of this file is unimportant, what matters is that the same branch has two different symbolic names, a condition which cvs2svn.py should warn about. @ -text -@Nothing to see here. -@ 1.1.1.1 diff -purNbBwx .svn cvs2svn-1.5.x/test-data/pass5-when-to-fill-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/pass5-when-to-fill-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/pass5-when-to-fill-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/pass5-when-to-fill-cvsrepos/CVSROOT/README 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/preferred-parent-cycle-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/preferred-parent-cycle-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/preferred-parent-cycle-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/preferred-parent-cycle-cvsrepos/CVSROOT/README 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/preferred-parent-cycle-cvsrepos/dir/file1,v cvs2svn-2.0.0/test-data/preferred-parent-cycle-cvsrepos/dir/file1,v --- cvs2svn-1.5.x/test-data/preferred-parent-cycle-cvsrepos/dir/file1,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/preferred-parent-cycle-cvsrepos/dir/file1,v 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,46 @@ +head 1.1; +access; +symbols + C:1.1.2.1.0.6 + B:1.1.2.1.0.4 + A:1.1.2.1.0.2 + X:1.1.0.2; +locks; strict; +comment @# @; + + +1.1 +date 2007.04.22.16.35.58; author mhagger; state Exp; +branches + 1.1.2.1; +next ; + +1.1.2.1 +date 2007.04.22.16.35.59; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@Adding files on trunk +@ +text +@1.1 +@ + + +1.1.2.1 +log +@Adding revision on first-level branches +@ +text +@d1 1 +a1 1 +1.1.2.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/preferred-parent-cycle-cvsrepos/dir/file2,v cvs2svn-2.0.0/test-data/preferred-parent-cycle-cvsrepos/dir/file2,v --- cvs2svn-1.5.x/test-data/preferred-parent-cycle-cvsrepos/dir/file2,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/preferred-parent-cycle-cvsrepos/dir/file2,v 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,46 @@ +head 1.1; +access; +symbols + A:1.1.2.1.0.6 + C:1.1.2.1.0.4 + B:1.1.2.1.0.2 + Y:1.1.0.2; +locks; strict; +comment @# @; + + +1.1 +date 2007.04.22.16.35.58; author mhagger; state Exp; +branches + 1.1.2.1; +next ; + +1.1.2.1 +date 2007.04.22.16.35.59; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@Adding files on trunk +@ +text +@1.1 +@ + + +1.1.2.1 +log +@Adding revision on first-level branches +@ +text +@d1 1 +a1 1 +1.1.2.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/preferred-parent-cycle-cvsrepos/dir/file3,v cvs2svn-2.0.0/test-data/preferred-parent-cycle-cvsrepos/dir/file3,v --- cvs2svn-1.5.x/test-data/preferred-parent-cycle-cvsrepos/dir/file3,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/preferred-parent-cycle-cvsrepos/dir/file3,v 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,46 @@ +head 1.1; +access; +symbols + B:1.1.2.1.0.6 + A:1.1.2.1.0.4 + C:1.1.2.1.0.2 + Z:1.1.0.2; +locks; strict; +comment @# @; + + +1.1 +date 2007.04.22.16.35.58; author mhagger; state Exp; +branches + 1.1.2.1; +next ; + +1.1.2.1 +date 2007.04.22.16.35.59; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@Adding files on trunk +@ +text +@1.1 +@ + + +1.1.2.1 +log +@Adding revision on first-level branches +@ +text +@d1 1 +a1 1 +1.1.2.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/preferred-parent-cycle-cvsrepos/makerepo.sh cvs2svn-2.0.0/test-data/preferred-parent-cycle-cvsrepos/makerepo.sh --- cvs2svn-1.5.x/test-data/preferred-parent-cycle-cvsrepos/makerepo.sh 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/preferred-parent-cycle-cvsrepos/makerepo.sh 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,116 @@ +#! /bin/sh + +# This is the script used to create the preferred-parent-cycle CVS +# repository. (The repository is checked into svn; this script is +# only here for its documentation value.) +# +# The script should be started from the main cvs2svn directory. +# +# The branching structure of the three files in this repository is +# constructed to create a loop in the preferred parent of each branch +# A, B, and C. The branches are as follows ('*' marks revisions, +# which are used to prevent trunk from being a possible parent of +# branches A, B, or C): +# +# file1: +# --*--+-------------------- trunk +# | +# +--*--+-------------- branch X +# | +# +-------------- branch A +# | +# +-------------- branch B +# | +# +-------------- branch C +# +# file2: +# --*--+-------------------- trunk +# | +# +--*--+-------------- branch Y +# | +# +-------------- branch B +# | +# +-------------- branch C +# | +# +-------------- branch A +# +# file3: +# --*--+-------------------- trunk +# | +# +--*--+-------------- branch Z +# | +# +-------------- branch C +# | +# +-------------- branch A +# | +# +-------------- branch B +# + +# Note that the possible parents of A are (X, Y, Z, C*2, B*1), those +# of B are (X, Y, Z, A*2, C*1), and those of C are (X, Y, Z, B*2, +# A*1). Therefore the preferred parents form a cycle A -> C -> B -> +# A. + +repo=`pwd`/test-data/preferred-parent-cycle-cvsrepos +wc=`pwd`/cvs2svn-tmp/preferred-parent-cycle-wc +[ -e $repo/CVSROOT ] && rm -rf $repo/CVSROOT +[ -e $repo/dir ] && rm -rf $repo/dir +[ -e $wc ] && rm -rf $wc + +cvs -d $repo init +cvs -d $repo co -d $wc . +cd $wc +mkdir dir +cvs add dir +cd dir +echo '1.1' >file1 +echo '1.1' >file2 +echo '1.1' >file3 +cvs add file1 file2 file3 +cvs commit -m 'Adding files on trunk' . + + +cvs tag -b X file1 +cvs up -r X file1 + +cvs tag -b Y file2 +cvs up -r Y file2 + +cvs tag -b Z file3 +cvs up -r Z file3 + +echo '1.1.2.1' >file1 +echo '1.1.2.1' >file2 +echo '1.1.2.1' >file3 +cvs commit -m 'Adding revision on first-level branches' . + + +cvs tag -b A file1 +cvs up -r A file1 + +cvs tag -b B file1 +cvs up -r B file1 + +cvs tag -b C file1 +cvs up -r C file1 + + +cvs tag -b B file2 +cvs up -r B file2 + +cvs tag -b C file2 +cvs up -r C file2 + +cvs tag -b A file2 +cvs up -r A file2 + + +cvs tag -b C file3 +cvs up -r C file3 + +cvs tag -b A file3 +cvs up -r A file3 + +cvs tag -b B file3 +cvs up -r B file3 + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/repeated-deltatext-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/repeated-deltatext-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/repeated-deltatext-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/repeated-deltatext-cvsrepos/CVSROOT/README 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/repeated-deltatext-cvsrepos/file.txt,v cvs2svn-2.0.0/test-data/repeated-deltatext-cvsrepos/file.txt,v --- cvs2svn-1.5.x/test-data/repeated-deltatext-cvsrepos/file.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/repeated-deltatext-cvsrepos/file.txt,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,63 @@ +head 1.3; +access; +symbols + uves_2_0_0:1.3 + UVES-1_3_0-beta1a:1.1; +locks; strict; +comment @# @; + + +1.3 +date 2003.06.30.12.59.08; author amodigli; state Exp; +branches; +next 1.2; + +1.2 +date 2003.06.30.12.54.52; author amodigli; state Exp; +branches; +next 1.1; + +1.1 +date 2002.02.11.17.59.51; author amodigli; state Exp; +branches; +next ; + + +desc +@@ + + +1.3 +log +@uves-2.0.0-rep +@ +text +@ COMMON /QC_LOG/MID_S_N_CENT,OBJ_POS_CENT, + + FWHM,N_CURR_ORD !to not pass a parameter to G_PROF +@ + + +1.2 +log +@uves-2.0.0 +@ +text +@@ + + +1.1 +log +@1st release +@ +text +@@ + + +1.1 +log +@Created +@ +text +@ COMMON /QC_LOG/MID_S_N_CENT,OBJ_POS_CENT, + + FWHM,N_CURR_ORD !to not pass a parameter to G_PROF +@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/resync-bug-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/resync-bug-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/resync-bug-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/resync-bug-cvsrepos/CVSROOT/README 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/resync-pass2-pull-forward-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/resync-pass2-pull-forward-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/resync-pass2-pull-forward-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/resync-pass2-pull-forward-cvsrepos/CVSROOT/README 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/resync-pass2-push-backward-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/resync-pass2-push-backward-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/resync-pass2-push-backward-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/resync-pass2-push-backward-cvsrepos/CVSROOT/README 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/revision-reorder-bug-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/revision-reorder-bug-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/revision-reorder-bug-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/revision-reorder-bug-cvsrepos/CVSROOT/README 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/symbol-mess-cvsrepos/makerepo.sh cvs2svn-2.0.0/test-data/symbol-mess-cvsrepos/makerepo.sh --- cvs2svn-1.5.x/test-data/symbol-mess-cvsrepos/makerepo.sh 2006-06-25 01:32:05.000000000 +0200 +++ cvs2svn-2.0.0/test-data/symbol-mess-cvsrepos/makerepo.sh 2007-08-15 22:53:49.000000000 +0200 @@ -7,7 +7,7 @@ # The script should be started from the main cvs2svn directory. repo=`pwd`/test-data/symbol-mess-cvsrepos -wc=`pwd`/tmp/symbol-mess-wc +wc=`pwd`/cvs2svn-tmp/symbol-mess-wc [ -e $repo/CVSROOT ] && rm -rf $repo/CVSROOT [ -e $repo/dir ] && rm -rf $repo/dir [ -e $wc ] && rm -rf $wc diff -purNbBwx .svn cvs2svn-1.5.x/test-data/symbolic-name-overfill-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/symbolic-name-overfill-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/symbolic-name-overfill-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/symbolic-name-overfill-cvsrepos/CVSROOT/README 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/symlinks-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/symlinks-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/symlinks-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/symlinks-cvsrepos/CVSROOT/README 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/symlinks-cvsrepos/proj/dir1/file.txt,v cvs2svn-2.0.0/test-data/symlinks-cvsrepos/proj/dir1/file.txt,v --- cvs2svn-1.5.x/test-data/symlinks-cvsrepos/proj/dir1/file.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/symlinks-cvsrepos/proj/dir1/file.txt,v 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,22 @@ +head 1.1; +access; +symbols; +locks; strict; +comment @# @; + + +1.1 +date 2007.04.08.08.10.10; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@@ +text +@@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/symlinks-cvsrepos/proj/dir2/file.txt,v cvs2svn-2.0.0/test-data/symlinks-cvsrepos/proj/dir2/file.txt,v --- cvs2svn-1.5.x/test-data/symlinks-cvsrepos/proj/dir2/file.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/symlinks-cvsrepos/proj/dir2/file.txt,v 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,22 @@ +head 1.1; +access; +symbols; +locks; strict; +comment @# @; + + +1.1 +date 2007.04.08.08.10.10; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@@ +text +@@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/symlinks-cvsrepos/proj/file.txt,v cvs2svn-2.0.0/test-data/symlinks-cvsrepos/proj/file.txt,v --- cvs2svn-1.5.x/test-data/symlinks-cvsrepos/proj/file.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/symlinks-cvsrepos/proj/file.txt,v 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,22 @@ +head 1.1; +access; +symbols; +locks; strict; +comment @# @; + + +1.1 +date 2007.04.08.08.10.10; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@@ +text +@@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/tag-with-no-revision-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/tag-with-no-revision-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/tag-with-no-revision-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/tag-with-no-revision-cvsrepos/CVSROOT/README 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/tag-with-no-revision-cvsrepos/README cvs2svn-2.0.0/test-data/tag-with-no-revision-cvsrepos/README --- cvs2svn-1.5.x/test-data/tag-with-no-revision-cvsrepos/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/tag-with-no-revision-cvsrepos/README 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,3 @@ +This repository is corrupt in a way that appears rather common among +our users. It contains a tag and a branch that refer to revision +1.1.2.1, but revision 1.1.2.1 does not exist. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/tag-with-no-revision-cvsrepos/file.txt,v cvs2svn-2.0.0/test-data/tag-with-no-revision-cvsrepos/file.txt,v --- cvs2svn-1.5.x/test-data/tag-with-no-revision-cvsrepos/file.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/tag-with-no-revision-cvsrepos/file.txt,v 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,48 @@ +head 1.3; +access; +symbols + TAG:1.1.2.1 + SUBBRANCH:1.1.2.1.0.2; +locks; strict; +comment @// @; + + +1.3 +date 2004.03.19.19.47.32; author joeschmo; state dead; +branches; +next 1.2; + +1.2 +date 98.06.22.21.46.37; author joeschmo; state Exp; +branches; +next 1.1; + +1.1 +date 98.06.05.10.40.26; author joeschmo; state dead; +branches; +next ; + + +desc +@@ + + +1.3 +log +@@ +text +@@ + + +1.2 +log +@@ +text +@@ + + +1.1 +log +@@ +text +@@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/tagging-after-delete-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/tagging-after-delete-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/tagging-after-delete-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/tagging-after-delete-cvsrepos/CVSROOT/README 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/tagging-after-delete-cvsrepos/test/Attic/b,v cvs2svn-2.0.0/test-data/tagging-after-delete-cvsrepos/test/Attic/b,v --- cvs2svn-1.5.x/test-data/tagging-after-delete-cvsrepos/test/Attic/b,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/tagging-after-delete-cvsrepos/test/Attic/b,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,38 @@ +head 1.2; +access; +symbols; +locks; strict; +comment @# @; + + +1.2 +date 2006.11.28.18.18.39; author mark; state dead; +branches; +next 1.1; + +1.1 +date 2006.11.28.18.18.24; author mark; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@removed file b +@ +text +@file b +@ + + +1.1 +log +@added files +@ +text +@@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/tagging-after-delete-cvsrepos/test/a,v cvs2svn-2.0.0/test-data/tagging-after-delete-cvsrepos/test/a,v --- cvs2svn-1.5.x/test-data/tagging-after-delete-cvsrepos/test/a,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/tagging-after-delete-cvsrepos/test/a,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,25 @@ +head 1.1; +access; +symbols + tag1:1.1; +locks; strict; +comment @# @; + + +1.1 +date 2006.11.28.18.18.23; author mark; state Exp; +branches; +next ; + + +desc +@@ + + +1.1 +log +@added files +@ +text +@file a +@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/timestamp-chaos-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/timestamp-chaos-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/timestamp-chaos-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/timestamp-chaos-cvsrepos/CVSROOT/README 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/timestamp-chaos-cvsrepos/proj/file1.txt,v cvs2svn-2.0.0/test-data/timestamp-chaos-cvsrepos/proj/file1.txt,v --- cvs2svn-1.5.x/test-data/timestamp-chaos-cvsrepos/proj/file1.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/timestamp-chaos-cvsrepos/proj/file1.txt,v 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,57 @@ +head 1.3; +access; +symbols; +locks; strict; +comment @# @; + + +1.3 +date 2007.01.01.22.00.00; author mhagger; state Exp; +branches; +next 1.2; + +1.2 +date 2000.01.01.00.00.00; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2007.01.01.21.00.00; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.3 +log +@Revision 1.3 +@ +text +@Revision 1.3 +@ + + +1.2 +log +@Revision 1.2 +@ +text +@d1 1 +a1 1 +Revision 1.2 +@ + + +1.1 +log +@Revision 1.1 +@ +text +@d1 1 +a1 1 +Revision 1.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/timestamp-chaos-cvsrepos/proj/file2.txt,v cvs2svn-2.0.0/test-data/timestamp-chaos-cvsrepos/proj/file2.txt,v --- cvs2svn-1.5.x/test-data/timestamp-chaos-cvsrepos/proj/file2.txt,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/timestamp-chaos-cvsrepos/proj/file2.txt,v 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,57 @@ +head 1.3; +access; +symbols; +locks; strict; +comment @# @; + + +1.3 +date 2007.01.01.22.00.00; author mhagger; state Exp; +branches; +next 1.2; + +1.2 +date 2030.01.01.00.00.00; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2007.01.01.21.00.00; author mhagger; state Exp; +branches; +next ; + + +desc +@@ + + +1.3 +log +@Revision 1.3 +@ +text +@Revision 1.3 +@ + + +1.2 +log +@Revision 1.2 +@ +text +@d1 1 +a1 1 +Revision 1.2 +@ + + +1.1 +log +@Revision 1.1 +@ +text +@d1 1 +a1 1 +Revision 1.1 +@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/trunk-readd-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/trunk-readd-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/trunk-readd-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/trunk-readd-cvsrepos/CVSROOT/README 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/trunk-readd-cvsrepos/root/b_file,v cvs2svn-2.0.0/test-data/trunk-readd-cvsrepos/root/b_file,v --- cvs2svn-1.5.x/test-data/trunk-readd-cvsrepos/root/b_file,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/trunk-readd-cvsrepos/root/b_file,v 2007-08-15 22:53:51.000000000 +0200 @@ -0,0 +1,55 @@ +head 1.2; +access; +symbols + mytag:1.1.2.1 + mybranch:1.1.0.2; +locks; strict; +comment @# @; + + +1.2 +date 2007.06.17.18.48.11; author mhagger; state Exp; +branches; +next 1.1; + +1.1 +date 2004.06.05.14.14.45; author max; state dead; +branches + 1.1.2.1; +next ; + +1.1.2.1 +date 2004.06.05.14.14.45; author max; state Exp; +branches; +next ; + + +desc +@@ + + +1.2 +log +@Re-adding b_file on trunk +@ +text +@b_file 1.2 +@ + + +1.1 +log +@file b_file was initially added on branch mybranch. +@ +text +@d1 1 +@ + + +1.1.2.1 +log +@Add b_file +@ +text +@@ + diff -purNbBwx .svn cvs2svn-1.5.x/test-data/unicode-log-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/unicode-log-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/unicode-log-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/unicode-log-cvsrepos/CVSROOT/README 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/unicode-log-cvsrepos/testunicode,v cvs2svn-2.0.0/test-data/unicode-log-cvsrepos/testunicode,v --- cvs2svn-1.5.x/test-data/unicode-log-cvsrepos/testunicode,v 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/unicode-log-cvsrepos/testunicode,v 2007-08-15 22:53:49.000000000 +0200 @@ -0,0 +1,24 @@ +head 1.1; +access; +symbols; +locks; strict; +comment @# @; + + +1.1 +date 2007.02.13.21.13.21; author kylo; state Exp; +branches ; +next ; + + +desc +@@ + + + +1.1 +log +@This is a test message with unicode: å +@ +text +@@ diff -purNbBwx .svn cvs2svn-1.5.x/test-data/vendor-branch-delete-add-cvsrepos/CVSROOT/README cvs2svn-2.0.0/test-data/vendor-branch-delete-add-cvsrepos/CVSROOT/README --- cvs2svn-1.5.x/test-data/vendor-branch-delete-add-cvsrepos/CVSROOT/README 1970-01-01 01:00:00.000000000 +0100 +++ cvs2svn-2.0.0/test-data/vendor-branch-delete-add-cvsrepos/CVSROOT/README 2007-08-15 22:53:50.000000000 +0200 @@ -0,0 +1,8 @@ +This CVSROOT/ directory is only here to convince CVS that this is a +real repository. Without it, CVS operations fail with an error like: + + cvs [checkout aborted]: .../main-cvsrepos/CVSROOT: No such file or directory + +Of course, CVS doesn't seem to require that there actually be any +files in CVSROOT/, which kind of makes one wonder why it cares about +the directory at all. diff -purNbBwx .svn cvs2svn-1.5.x/test-data/verification/README cvs2svn-2.0.0/test-data/verification/README --- cvs2svn-1.5.x/test-data/verification/README 2003-07-09 22:00:10.000000000 +0200 +++ cvs2svn-2.0.0/test-data/verification/README 1970-01-01 01:00:00.000000000 +0100 @@ -1,9 +0,0 @@ -The files here are a hand-constructed verification (in progress) of -what cvs2svn.py should produce when run on 'main-cvsrepos'. The -regression suite's expectations will eventually be based on this. - -I'm just saving this data here because it took long enough to -generate, and is undergoing complex enough massages, that it's worth -versioning. You probably want to ignore it unless you're me. - --kfogel diff -purNbBwx .svn cvs2svn-1.5.x/test-data/verification/commits cvs2svn-2.0.0/test-data/verification/commits --- cvs2svn-1.5.x/test-data/verification/commits 2004-03-09 18:50:23.000000000 +0100 +++ cvs2svn-2.0.0/test-data/verification/commits 1970-01-01 01:00:00.000000000 +0100 @@ -1,217 +0,0 @@ -Each line is of the form - - DATE; AUTHOR; STATE; REVISION (BRANCHES) (TAGS) PATH - -The lines are sorted and lazily divided into commit groups. (There -may be more commits than listed here, since the log messages were not -included, but those will come out in the wash as we verify the -output.) Comments and problems are noted in "[]" brackets at the end -of each commit group. - -Here we go: - -*** end rev 0 *** - -93.06.18.05.46.07; jrandom; Exp; 1.1 () () full-prune/Attic/first -93.06.18.05.46.07; jrandom; Exp; 1.1 () () full-prune-reappear/sub/Attic/first -93.06.18.05.46.07; jrandom; Exp; 1.1 () () partial-prune/sub/Attic/first - -*** end rev 1 *** - -93.06.18.05.46.08; jrandom; Exp; 1.1.1.1 () () full-prune/Attic/first -93.06.18.05.46.08; jrandom; Exp; 1.1.1.1 () () full-prune-reappear/sub/Attic/first -93.06.18.05.46.08; jrandom; Exp; 1.1.1.1 () () partial-prune/sub/Attic/first - - [Notes: these files had no 'vendortag' symbol for 1.1.1.1, - so they go in an unlabeled branch.] - -*** end rev 2 *** - -94.06.18.05.46.08; jrandom; Exp; 1.1 () () partial-prune/permanent - -*** end rev 3 *** - -94.06.18.05.46.08; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag) partial-prune/permanent - -*** end rev 4 *** - -95.03.31.07.44.01; jrandom; Exp; 1.1 () () full-prune/Attic/second -95.03.31.07.44.01; jrandom; Exp; 1.1 () () full-prune-reappear/sub/Attic/second -95.03.31.07.44.01; jrandom; Exp; 1.1 () () partial-prune/sub/Attic/second - -*** end rev 5 *** - -95.03.31.07.44.02; jrandom; Exp; 1.1.1.1 () () full-prune/Attic/second -95.03.31.07.44.02; jrandom; Exp; 1.1.1.1 () () full-prune-reappear/sub/Attic/second -95.03.31.07.44.02; jrandom; Exp; 1.1.1.1 () () partial-prune/sub/Attic/second - [Again, these files had no 'vendortag' symbol for 1.1.1.1, - so they go in an unlabeled branch.] - -*** end rev 6 *** - -95.12.11.00.27.53; jrandom; dead; 1.2 () () full-prune/Attic/first -95.12.11.00.27.53; jrandom; dead; 1.2 () () full-prune-reappear/sub/Attic/first -95.12.11.00.27.53; jrandom; dead; 1.2 () () partial-prune/sub/Attic/first - -*** end rev 7 *** - -95.12.30.18.37.22; jrandom; dead; 1.3 () () full-prune/Attic/first -95.12.30.18.37.22; jrandom; dead; 1.3 () () full-prune-reappear/sub/Attic/first -95.12.30.18.37.22; jrandom; dead; 1.3 () () partial-prune/sub/Attic/first - - [Note: If we change the behavior whereby redeletion causes an empty - revision, then the numbering from here on may shift back by 1.] - -*** end rev 8 (no-op, because redeletions) *** - -96.08.20.23.53.47; jrandom; dead; 1.2 () () full-prune/Attic/second -96.08.20.23.53.47; jrandom; dead; 1.2 () () full-prune-reappear/sub/Attic/second -96.08.20.23.53.47; jrandom; dead; 1.2 () () partial-prune/sub/Attic/second - - [Note that the removal of these files causes pruning of directories - higher up. See also note in rev 8 about possible renumbering.] - -*** end rev 9 *** - -2002.09.29.00.00.00; jrandom; Exp; 1.1 () () single-files/twoquick - -*** end rev 10 *** - -2002.09.29.00.00.01; jrandom; Exp; 1.2 () (after) single-files/twoquick - -*** end rev 11 *** - -2002.11.30.19.27.42; jrandom; Exp; 1.1 () () single-files/space fname - -*** end rev 12 *** - -2002.11.30.19.27.42; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag) single-files/space fname - -*** end rev 13 *** - -2003.01.25.13.43.57; jrandom; Exp; 1.1 () () single-files/attr-exec - -*** end rev 14 *** - -2003.01.25.13.43.57; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag) single-files/attr-exec - -*** end rev 15 *** - -2003.05.22.23.20.19; jrandom; Exp; 1.1 () () proj/default -2003.05.22.23.20.19; jrandom; Exp; 1.1 () () proj/sub1/default -2003.05.22.23.20.19; jrandom; Exp; 1.1 () () proj/sub1/subsubA/default -2003.05.22.23.20.19; jrandom; Exp; 1.1 () () proj/sub1/subsubB/default -2003.05.22.23.20.19; jrandom; Exp; 1.1 () () proj/sub2/default -2003.05.22.23.20.19; jrandom; Exp; 1.1 () (T_MIXED) proj/sub2/subsubA/default -2003.05.22.23.20.19; jrandom; Exp; 1.1 () () proj/sub3/default - -*** end rev 16 *** - -2003.05.22.23.20.19; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag, T_ALL_INITIAL_FILES, T_ALL_INITIAL_FILES_BUT_ONE) proj/default -2003.05.22.23.20.19; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag, T_ALL_INITIAL_FILES, T_ALL_INITIAL_FILES_BUT_ONE) proj/sub1/default -2003.05.22.23.20.19; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag, T_ALL_INITIAL_FILES, T_ALL_INITIAL_FILES_BUT_ONE) proj/sub1/subsubA/default -2003.05.22.23.20.19; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag, T_ALL_INITIAL_FILES) proj/sub1/subsubB/default -2003.05.22.23.20.19; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag, T_ALL_INITIAL_FILES, T_ALL_INITIAL_FILES_BUT_ONE) proj/sub2/default -2003.05.22.23.20.19; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag, T_ALL_INITIAL_FILES, T_ALL_INITIAL_FILES_BUT_ONE) proj/sub2/subsubA/default -2003.05.22.23.20.19; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag, T_ALL_INITIAL_FILES, T_ALL_INITIAL_FILES_BUT_ONE) proj/sub3/default - -*** end rev 17 *** - -2003.05.23.00.15.26; jrandom; Exp; 1.2 () () proj/sub1/subsubA/default -2003.05.23.00.15.26; jrandom; Exp; 1.2 () (T_MIXED) proj/sub3/default - -*** end rev 18 *** - -2003.05.23.00.17.53; jrandom; Exp; 1.2 () (T_MIXED) proj/default -2003.05.23.00.17.53; jrandom; Exp; 1.2 () (T_MIXED) proj/sub1/default -2003.05.23.00.17.53; jrandom; Exp; 1.2 () (T_MIXED) proj/sub1/subsubB/default -2003.05.23.00.17.53; jrandom; Exp; 1.2 () (T_MIXED) proj/sub2/default -2003.05.23.00.17.53; jrandom; Exp; 1.2 () () proj/sub2/subsubA/default -2003.05.23.00.17.53; jrandom; Exp; 1.3 () (T_MIXED) proj/sub1/subsubA/default -2003.05.23.00.17.53; jrandom; Exp; 1.3 () () proj/sub3/default - -*** end rev 19 *** - -2003.05.23.00.25.26; jrandom; dead; 1.1 () () proj/sub2/Attic/branch_B_MIXED_only -2003.05.23.00.25.26; jrandom; Exp; 1.1.2.1 (B_MIXED) () proj/sub2/Attic/branch_B_MIXED_only - - [Okay, problem: This revision creates branch B_MIXED, by copying - revision 19 of trunk. But it fails to recopy 'proj/sub3/default' - and 'proj/sub2/subsubA/default' from some revision earlier than - 19, which it needs to do because in those files, B_MIXED sprouts - from 1.2 and 1.1 respectively, not 1.3 and 1.2 (the latter being - what you get if you just copy from revision 19). This means that - the contents of those files are wrong in branches/B_MIXED/'s first - incarnation. Even if they get fixed up in some later revision, - there is still a period of wrongness.] - -*** end rev 20 *** - -2003.05.23.00.31.36; jrandom; Exp; 1.1.2.1 (B_MIXED) () proj/sub2/subsubA/default -2003.05.23.00.31.36; jrandom; Exp; 1.2.2.1 (B_MIXED) () proj/default -2003.05.23.00.31.36; jrandom; Exp; 1.2.2.1 (B_MIXED) () proj/sub1/default - -*** end of tentative commit group X *** - -2003.05.23.00.48.51; jrandom; Exp; 1.1.2.2 (B_MIXED) () proj/sub2/Attic/branch_B_MIXED_only -2003.05.23.00.48.51; jrandom; Exp; 1.3 () () proj/sub2/default - -*** end of tentative commit group X *** - -2003.05.24.01.02.01; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag) interleaved/1 -2003.05.24.01.02.01; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag) interleaved/2 -2003.05.24.01.02.01; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag) interleaved/3 -2003.05.24.01.02.01; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag) interleaved/4 -2003.05.24.01.02.01; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag) interleaved/5 -2003.05.24.01.02.01; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag) interleaved/a -2003.05.24.01.02.01; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag) interleaved/b -2003.05.24.01.02.01; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag) interleaved/c -2003.05.24.01.02.01; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag) interleaved/d -2003.05.24.01.02.01; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag) interleaved/e - -*** end of tentative commit group X *** - -2003.05.24.01.02.01; jrandom; Exp; 1.1 () () interleaved/1 -2003.05.24.01.02.01; jrandom; Exp; 1.1 () () interleaved/2 -2003.05.24.01.02.01; jrandom; Exp; 1.1 () () interleaved/3 -2003.05.24.01.02.01; jrandom; Exp; 1.1 () () interleaved/4 -2003.05.24.01.02.01; jrandom; Exp; 1.1 () () interleaved/5 -2003.05.24.01.02.01; jrandom; Exp; 1.1 () () interleaved/a -2003.05.24.01.02.01; jrandom; Exp; 1.1 () () interleaved/b -2003.05.24.01.02.01; jrandom; Exp; 1.1 () () interleaved/c -2003.05.24.01.02.01; jrandom; Exp; 1.1 () () interleaved/d -2003.05.24.01.02.01; jrandom; Exp; 1.1 () () interleaved/e - -*** end of tentative commit group X *** - -2003.06.03.00.20.01; jrandom; Exp; 1.2 () () interleaved/1 -2003.06.03.00.20.01; jrandom; Exp; 1.2 () () interleaved/2 -2003.06.03.00.20.01; jrandom; Exp; 1.2 () () interleaved/3 -2003.06.03.00.20.01; jrandom; Exp; 1.2 () () interleaved/4 -2003.06.03.00.20.01; jrandom; Exp; 1.2 () () interleaved/5 -2003.06.03.00.20.01; jrandom; Exp; 1.2 () () interleaved/a -2003.06.03.00.20.01; jrandom; Exp; 1.2 () () interleaved/b -2003.06.03.00.20.01; jrandom; Exp; 1.2 () () interleaved/c -2003.06.03.00.20.01; jrandom; Exp; 1.2 () () interleaved/d -2003.06.03.00.20.01; jrandom; Exp; 1.2 () () interleaved/e - -*** end of tentative commit group X *** - -2003.06.03.03.20.31; jrandom; Exp; 1.2.2.1 (B_SPLIT) () proj/sub2/subsubA/default -2003.06.03.03.20.31; jrandom; Exp; 1.2.4.1 (B_SPLIT) () proj/default -2003.06.03.03.20.31; jrandom; Exp; 1.2.4.1 (B_SPLIT) () proj/sub1/default -2003.06.03.03.20.31; jrandom; Exp; 1.3.2.1 (B_SPLIT) () proj/sub2/default -2003.06.03.03.20.31; jrandom; Exp; 1.3.4.1 (B_SPLIT) () proj/sub1/subsubA/default - -*** end of tentative commit group X *** - -2003.06.03.04.29.14; jrandom; Exp; 1.3 () () proj/sub1/subsubB/default -2003.06.03.04.33.13; jrandom; Exp; 1.3.2.1 (B_SPLIT) () proj/sub1/subsubB/default -2003.06.03.04.33.13; jrandom; Exp; 1.3.2.1 (B_SPLIT) () proj/sub3/default - -*** end of tentative commit group X *** - -2003.06.10.20.19.48; jrandom; Exp; 1.1.1.1 (vendorbranch) (vendortag) full-prune-reappear/appears-later -2003.06.10.20.19.48; jrandom; Exp; 1.1 () () full-prune-reappear/appears-later - -*** end of tentative commit group X *** diff -purNbBwx .svn cvs2svn-1.5.x/test-data/verification/labels cvs2svn-2.0.0/test-data/verification/labels --- cvs2svn-1.5.x/test-data/verification/labels 2003-07-09 22:00:10.000000000 +0200 +++ cvs2svn-2.0.0/test-data/verification/labels 1970-01-01 01:00:00.000000000 +0100 @@ -1,176 +0,0 @@ -All the symbolic names that appear on each file, with "+" signs -following those associated with commits (in other words, branches on -which no commits happened will lack the plus sign; every other -symbolic name should have it). - -* full-prune/Attic/first - -* full-prune/Attic/second - -* full-prune-reappear/appears-later - - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* full-prune-reappear/sub/Attic/first - -* full-prune-reappear/sub/Attic/second - -* interleaved/1 - - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* interleaved/2 - - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* interleaved/3 - - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* interleaved/4 - - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* interleaved/5 - - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* interleaved/a - - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* interleaved/b - - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* interleaved/c - - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* interleaved/d - - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* interleaved/e - - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* partial-prune/permanent - - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* partial-prune/sub/Attic/first - -* partial-prune/sub/Attic/second - -* proj/default - - B_SPLIT:1.2.0.4 + - B_MIXED:1.2.0.2 + - T_MIXED:1.2 + - B_FROM_INITIALS_BUT_ONE:1.1.1.1.0.4 - B_FROM_INITIALS:1.1.1.1.0.2 - T_ALL_INITIAL_FILES_BUT_ONE:1.1.1.1 + - T_ALL_INITIAL_FILES:1.1.1.1 + - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* proj/sub1/default - - B_SPLIT:1.2.0.4 + - B_MIXED:1.2.0.2 + - T_MIXED:1.2 + - B_FROM_INITIALS_BUT_ONE:1.1.1.1.0.4 - B_FROM_INITIALS:1.1.1.1.0.2 - T_ALL_INITIAL_FILES_BUT_ONE:1.1.1.1 + - T_ALL_INITIAL_FILES:1.1.1.1 + - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* proj/sub1/subsubA/default - - B_SPLIT:1.3.0.4 + - B_MIXED:1.3.0.2 - T_MIXED:1.3 + - B_FROM_INITIALS_BUT_ONE:1.1.1.1.0.4 - B_FROM_INITIALS:1.1.1.1.0.2 - T_ALL_INITIAL_FILES_BUT_ONE:1.1.1.1 + - T_ALL_INITIAL_FILES:1.1.1.1 + - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* proj/sub1/subsubB/default - - B_SPLIT:1.3.0.2 + - B_MIXED:1.2.0.2 - T_MIXED:1.2 + - B_FROM_INITIALS:1.1.1.1.0.2 - T_ALL_INITIAL_FILES:1.1.1.1 + - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* proj/sub2/Attic/branch_B_MIXED_only - - B_MIXED:1.1.0.2 + - -* proj/sub2/default - - B_SPLIT:1.3.0.2 + - B_MIXED:1.2.0.2 - T_MIXED:1.2 + - B_FROM_INITIALS_BUT_ONE:1.1.1.1.0.4 - B_FROM_INITIALS:1.1.1.1.0.2 - T_ALL_INITIAL_FILES_BUT_ONE:1.1.1.1 + - T_ALL_INITIAL_FILES:1.1.1.1 + - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* proj/sub2/subsubA/default - - B_SPLIT:1.2.0.2 + - B_MIXED:1.1.0.2 + - T_MIXED:1.1 + - B_FROM_INITIALS_BUT_ONE:1.1.1.1.0.4 - B_FROM_INITIALS:1.1.1.1.0.2 - T_ALL_INITIAL_FILES_BUT_ONE:1.1.1.1 + - T_ALL_INITIAL_FILES:1.1.1.1 + - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* proj/sub3/default - - B_SPLIT:1.3.0.2 + - B_MIXED:1.2.0.2 - T_MIXED:1.2 + - B_FROM_INITIALS_BUT_ONE:1.1.1.1.0.4 - B_FROM_INITIALS:1.1.1.1.0.2 - T_ALL_INITIAL_FILES_BUT_ONE:1.1.1.1 + - T_ALL_INITIAL_FILES:1.1.1.1 + - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* single-files/attr-exec - - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* single-files/space fname - - vendortag:1.1.1.1 + - vendorbranch:1.1.1 + - -* single-files/twoquick - - after:1.2 diff -purNbBwx .svn cvs2svn-1.5.x/test-data/verification/log cvs2svn-2.0.0/test-data/verification/log --- cvs2svn-1.5.x/test-data/verification/log 2003-08-07 19:47:26.000000000 +0200 +++ cvs2svn-2.0.0/test-data/verification/log 1970-01-01 01:00:00.000000000 +0100 @@ -1,384 +0,0 @@ -$ grep LastChangedRevision cvs2svn.py -# $LastChangedRevision: 6567 $ - -$ svn log -v file://`pwd`/tmp/r ------------------------------------------------------------------------- -rev 41: unknown | 2003-08-07 13:04:48 -0500 (Thu, 07 Aug 2003) | 1 line -Changed paths: - A /tags/after (from /trunk:11) - D /tags/after/partial-prune - -This commit was manufactured by cvs2svn to create tag 'after'. ------------------------------------------------------------------------- -rev 40: unknown | 2003-08-07 13:04:48 -0500 (Thu, 07 Aug 2003) | 2 lines -Changed paths: - A /tags/T_ALL_INITIAL_FILES_BUT_ONE (from /branches/vendorbranch:17) - D /tags/T_ALL_INITIAL_FILES_BUT_ONE/single-files - -This commit was manufactured by cvs2svn to create tag -'T_ALL_INITIAL_FILES_BUT_ONE'. ------------------------------------------------------------------------- -rev 39: unknown | 2003-08-07 13:04:48 -0500 (Thu, 07 Aug 2003) | 1 line -Changed paths: - A /tags/vendortag (from /branches/vendorbranch:24) - A /tags/vendortag/full-prune-reappear (from /branches/vendorbranch/full-prune-reappear:31) - -This commit was manufactured by cvs2svn to create tag 'vendortag'. ------------------------------------------------------------------------- -rev 38: unknown | 2003-08-07 13:04:48 -0500 (Thu, 07 Aug 2003) | 2 lines -Changed paths: - A /tags/T_ALL_INITIAL_FILES (from /branches/vendorbranch:17) - D /tags/T_ALL_INITIAL_FILES/single-files - -This commit was manufactured by cvs2svn to create tag -'T_ALL_INITIAL_FILES'. ------------------------------------------------------------------------- -rev 37: unknown | 2003-08-07 13:04:48 -0500 (Thu, 07 Aug 2003) | 1 line -Changed paths: - A /tags - A /tags/T_MIXED (from /trunk:19) - D /tags/T_MIXED/partial-prune - D /tags/T_MIXED/single-files - -This commit was manufactured by cvs2svn to create tag 'T_MIXED'. ------------------------------------------------------------------------- -rev 35: unknown | 2003-08-07 13:04:48 -0500 (Thu, 07 Aug 2003) | 2 lines -Changed paths: - A /branches/B_FROM_INITIALS_BUT_ONE (from /branches/vendorbranch:17) - D /branches/B_FROM_INITIALS_BUT_ONE/single-files - -This commit was manufactured by cvs2svn to create branch -'B_FROM_INITIALS_BUT_ONE'. ------------------------------------------------------------------------- -rev 32: unknown | 2003-08-07 13:04:48 -0500 (Thu, 07 Aug 2003) | 2 lines -Changed paths: - A /branches/B_FROM_INITIALS (from /branches/vendorbranch:17) - D /branches/B_FROM_INITIALS/single-files - -This commit was manufactured by cvs2svn to create branch -'B_FROM_INITIALS'. ------------------------------------------------------------------------- -rev 31: jrandom | 2003-06-10 15:19:48 -0500 (Tue, 10 Jun 2003) | 2 lines -Changed paths: - A /branches/vendorbranch/full-prune-reappear (from /trunk/full-prune-reappear:30) - -Updated CVS - ------------------------------------------------------------------------- -rev 30: jrandom | 2003-06-10 15:19:47 -0500 (Tue, 10 Jun 2003) | 2 lines -Changed paths: - A /trunk/full-prune-reappear - A /trunk/full-prune-reappear/appears-later - -Initial revision - ------------------------------------------------------------------------- -rev 29: jrandom | 2003-06-02 23:33:13 -0500 (Mon, 02 Jun 2003) | 4 lines -Changed paths: - A /branches/B_SPLIT/proj/sub1/subsubB (from /trunk/proj/sub1/subsubB:28) - M /branches/B_SPLIT/proj/sub1/subsubB/default - M /branches/B_SPLIT/proj/sub3/default - -This change affects sub3/default and sub1/subsubB/default, on branch -B_SPLIT. Note that the latter file did not even exist on this branch -until after some other files had had revisions committed on B_SPLIT. - ------------------------------------------------------------------------- -rev 28: jrandom | 2003-06-02 23:29:14 -0500 (Mon, 02 Jun 2003) | 6 lines -Changed paths: - M /trunk/proj/sub1/subsubB/default - -A trunk change to sub1/subsubB/default. This was committed about an -hour after an earlier change that affected most files on branch -B_SPLIT. This file is not on that branch yet, but after this commit, -we'll branch to B_SPLIT, albeit rooted in a revision that didn't exist -at the time the rest of B_SPLIT was created. - ------------------------------------------------------------------------- -rev 27: jrandom | 2003-06-02 22:20:31 -0500 (Mon, 02 Jun 2003) | 5 lines -Changed paths: - A /branches/B_SPLIT (from /trunk:19) - D /branches/B_SPLIT/partial-prune - M /branches/B_SPLIT/proj/default - M /branches/B_SPLIT/proj/sub1/default - M /branches/B_SPLIT/proj/sub1/subsubA/default - D /branches/B_SPLIT/proj/sub1/subsubB - M /branches/B_SPLIT/proj/sub2/default - M /branches/B_SPLIT/proj/sub2/subsubA/default - D /branches/B_SPLIT/single-files - -First change on branch B_SPLIT. - -This change excludes sub3/default, because it was not part of this -commit, and sub1/subsubB/default, which is not even on the branch yet. - ------------------------------------------------------------------------- -rev 26: jrandom | 2003-06-02 19:20:01 -0500 (Mon, 02 Jun 2003) | 2 lines -Changed paths: - M /trunk/interleaved/1 - M /trunk/interleaved/2 - M /trunk/interleaved/3 - M /trunk/interleaved/4 - M /trunk/interleaved/5 - -Committing numbers only. - ------------------------------------------------------------------------- -rev 25: jrandom | 2003-06-02 19:20:01 -0500 (Mon, 02 Jun 2003) | 2 lines -Changed paths: - M /trunk/interleaved/a - M /trunk/interleaved/b - M /trunk/interleaved/c - M /trunk/interleaved/d - M /trunk/interleaved/e - -Committing letters only. - ------------------------------------------------------------------------- -rev 24: jrandom | 2003-05-23 20:02:01 -0500 (Fri, 23 May 2003) | 2 lines -Changed paths: - A /branches/vendorbranch/interleaved (from /trunk/interleaved:23) - -Initial import. - ------------------------------------------------------------------------- -rev 23: jrandom | 2003-05-23 20:02:00 -0500 (Fri, 23 May 2003) | 2 lines -Changed paths: - A /trunk/interleaved - A /trunk/interleaved/1 - A /trunk/interleaved/2 - A /trunk/interleaved/3 - A /trunk/interleaved/4 - A /trunk/interleaved/5 - A /trunk/interleaved/a - A /trunk/interleaved/b - A /trunk/interleaved/c - A /trunk/interleaved/d - A /trunk/interleaved/e - -Initial revision - ------------------------------------------------------------------------- -rev 22: jrandom | 2003-05-22 19:48:51 -0500 (Thu, 22 May 2003) | 2 lines -Changed paths: - M /branches/B_MIXED/proj/sub2/branch_B_MIXED_only - M /trunk/proj/sub2/default - -A single commit affecting one file on branch B_MIXED and one on trunk. - ------------------------------------------------------------------------- -rev 21: jrandom | 2003-05-22 19:31:36 -0500 (Thu, 22 May 2003) | 2 lines -Changed paths: - M /branches/B_MIXED/proj/default - M /branches/B_MIXED/proj/sub1/default - M /branches/B_MIXED/proj/sub2/subsubA/default - -Modify three files, on branch B_MIXED. - ------------------------------------------------------------------------- -rev 20: jrandom | 2003-05-22 19:25:26 -0500 (Thu, 22 May 2003) | 2 lines -## problem found, see notes in the 'commits' file. ## -Changed paths: - A /branches/B_MIXED (from /trunk:19) - D /branches/B_MIXED/partial-prune - A /branches/B_MIXED/proj/sub2/branch_B_MIXED_only - D /branches/B_MIXED/single-files - -Add a file on branch B_MIXED. - ------------------------------------------------------------------------- -rev 19: jrandom | 2003-05-22 19:17:53 -0500 (Thu, 22 May 2003) | 2 lines -## verified ## -Changed paths: - M /trunk/proj/default - M /trunk/proj/sub1/default - M /trunk/proj/sub1/subsubA/default - M /trunk/proj/sub1/subsubB/default - M /trunk/proj/sub2/default - M /trunk/proj/sub2/subsubA/default - M /trunk/proj/sub3/default - -Second commit to proj, affecting all 7 files. - ------------------------------------------------------------------------- -rev 18: jrandom | 2003-05-22 19:15:26 -0500 (Thu, 22 May 2003) | 2 lines -## verified ## -Changed paths: - M /trunk/proj/sub1/subsubA/default - M /trunk/proj/sub3/default - -First commit to proj, affecting two files. - ------------------------------------------------------------------------- -rev 17: jrandom | 2003-05-22 18:20:19 -0500 (Thu, 22 May 2003) | 2 lines -## verified ## -Changed paths: - A /branches/vendorbranch/proj (from /trunk/proj:16) - -Initial import. - ------------------------------------------------------------------------- -rev 16: jrandom | 2003-05-22 18:20:18 -0500 (Thu, 22 May 2003) | 2 lines -## verified ## -Changed paths: - A /trunk/proj - A /trunk/proj/default - A /trunk/proj/sub1 - A /trunk/proj/sub1/default - A /trunk/proj/sub1/subsubA - A /trunk/proj/sub1/subsubA/default - A /trunk/proj/sub1/subsubB - A /trunk/proj/sub1/subsubB/default - A /trunk/proj/sub2 - A /trunk/proj/sub2/default - A /trunk/proj/sub2/subsubA - A /trunk/proj/sub2/subsubA/default - A /trunk/proj/sub3 - A /trunk/proj/sub3/default - -Initial revision - ------------------------------------------------------------------------- -rev 15: jrandom | 2003-01-25 07:43:57 -0600 (Sat, 25 Jan 2003) | 2 lines -## verified ## -Changed paths: - A /branches/vendorbranch/single-files/attr-exec (from /trunk/single-files/attr-exec:14) - -initial checkin - ------------------------------------------------------------------------- -rev 14: jrandom | 2003-01-25 07:43:56 -0600 (Sat, 25 Jan 2003) | 2 lines -## verified ## -Changed paths: - A /trunk/single-files/attr-exec - -Initial revision - ------------------------------------------------------------------------- -rev 13: jrandom | 2002-11-30 13:27:42 -0600 (Sat, 30 Nov 2002) | 2 lines -## verified ## -Changed paths: - A /branches/vendorbranch/single-files (from /trunk/single-files:12) - D /branches/vendorbranch/single-files/twoquick - -imported - ------------------------------------------------------------------------- -rev 12: jrandom | 2002-11-30 13:27:41 -0600 (Sat, 30 Nov 2002) | 2 lines -## verified ## -Changed paths: - A /trunk/single-files/space fname - -Initial revision - ------------------------------------------------------------------------- -rev 11: jrandom | 2002-09-28 19:00:01 -0500 (Sat, 28 Sep 2002) | 2 lines -## verified ## -Changed paths: - M /trunk/single-files/twoquick - -*** empty log message *** - ------------------------------------------------------------------------- -rev 10: jrandom | 2002-09-28 19:00:00 -0500 (Sat, 28 Sep 2002) | 2 lines -## verified ## -Changed paths: - A /trunk/single-files - A /trunk/single-files/twoquick - -*** empty log message *** - ------------------------------------------------------------------------- -rev 9: jrandom | 1996-08-20 18:53:47 -0500 (Tue, 20 Aug 1996) | 5 lines -## verified ## -Changed paths: - D /trunk/full-prune - D /trunk/full-prune-reappear - D /trunk/partial-prune/sub - -Remove the file 'second'. - -Since 'first' was already removed, removing 'second' empties the -directory, so the directory itself gets pruned. - ------------------------------------------------------------------------- -rev 7: jrandom | 1995-12-10 18:27:53 -0600 (Sun, 10 Dec 1995) | 4 lines -## verified ## -Changed paths: - D /trunk/full-prune/first - D /trunk/full-prune-reappear/sub/first - D /trunk/partial-prune/sub/first - -Remove the file 'first', for the first time. - -(Note that its sibling 'second' still exists.) - ------------------------------------------------------------------------- -rev 6: jrandom | 1995-03-31 01:44:02 -0600 (Fri, 31 Mar 1995) | 3 lines -## verified ## -Changed paths: - M /trunk/full-prune/second - M /trunk/full-prune-reappear/sub/second - M /trunk/partial-prune/sub/second - -Original sources from CVS-1.4A2 munged to fit our directory structure. - - ------------------------------------------------------------------------- -rev 5: jrandom | 1995-03-31 01:44:01 -0600 (Fri, 31 Mar 1995) | 2 lines -## verified ## -Changed paths: - A /trunk/full-prune/second - A /trunk/full-prune-reappear/sub/second - A /trunk/partial-prune/sub/second - -Initial revision - ------------------------------------------------------------------------- -rev 4: jrandom | 1994-06-18 00:46:08 -0500 (Sat, 18 Jun 1994) | 2 lines -## verified (leaves only vendorbranch/partial-prune/permanent) ## -Changed paths: - A /branches - A /branches/vendorbranch (from /trunk:3) - D /branches/vendorbranch/full-prune - D /branches/vendorbranch/full-prune-reappear - D /branches/vendorbranch/partial-prune/sub - -Updated CVS - ------------------------------------------------------------------------- -rev 3: jrandom | 1994-06-18 00:46:07 -0500 (Sat, 18 Jun 1994) | 2 lines -## verified ## -Changed paths: - A /trunk/partial-prune/permanent - -Initial revision - ------------------------------------------------------------------------- -rev 2: jrandom | 1993-06-18 00:46:08 -0500 (Fri, 18 Jun 1993) | 2 lines -## verified ## -Changed paths: - M /trunk/full-prune/first - M /trunk/full-prune-reappear/sub/first - M /trunk/partial-prune/sub/first - -Updated CVS - ------------------------------------------------------------------------- -rev 1: jrandom | 1993-06-18 00:46:07 -0500 (Fri, 18 Jun 1993) | 2 lines -## verified ## -Changed paths: - A /trunk - A /trunk/full-prune - A /trunk/full-prune/first - A /trunk/full-prune-reappear - A /trunk/full-prune-reappear/sub - A /trunk/full-prune-reappear/sub/first - A /trunk/partial-prune - A /trunk/partial-prune/sub - A /trunk/partial-prune/sub/first - -Initial revision - ------------------------------------------------------------------------- -$ diff -purNbBwx .svn cvs2svn-1.5.x/verify-cvs2svn.py cvs2svn-2.0.0/verify-cvs2svn.py --- cvs2svn-1.5.x/verify-cvs2svn.py 2006-09-14 04:55:15.000000000 +0200 +++ cvs2svn-2.0.0/verify-cvs2svn.py 2007-08-15 22:53:54.000000000 +0200 @@ -2,7 +2,7 @@ # (Be in -*- python -*- mode.) # # ==================================================================== -# Copyright (c) 2000-2004 CollabNet. All rights reserved. +# Copyright (c) 2000-2007 CollabNet. All rights reserved. # # This software is licensed as described in the file COPYING, which # you should have received as part of this distribution. The terms @@ -374,12 +374,13 @@ def main(argv): elif (opt == '--skip-cleanup'): ctx.skip_cleanup = 1 elif opt == '--symbol-transform': + # This is broken! [pattern, replacement] = value.split(":") try: - pattern = re.compile(pattern) - except re.error, e: + symbol_transforms.append( + RegexpSymbolTransform(pattern, replacement)) + except re.error: raise FatalError("'%s' is not a valid regexp." % (pattern,)) - ctx.symbol_transforms.append((pattern, replacement,)) # Consistency check for options and arguments. if len(args) != 2: diff -purNbBwx .svn cvs2svn-1.5.x/www/cvs2svn.html cvs2svn-2.0.0/www/cvs2svn.html --- cvs2svn-1.5.x/www/cvs2svn.html 2006-10-03 17:14:52.000000000 +0200 +++ cvs2svn-2.0.0/www/cvs2svn.html 2007-08-15 22:53:48.000000000 +0200 @@ -21,6 +21,8 @@ <ul> + <li><a href="#intro">Introduction</a></li> + <li><a href="#reqs">Requirements</a></li> <li><a href="#install">Installation</a></li> @@ -29,6 +31,10 @@ <li><a href="#prep">Prepping your repository</a></li> + <li><a href="#cmd-vs-options">Command line vs. options file</a></li> + + <li><a href="#symbols">Symbol handling</a></li> + <li><a href="#cmd-ref">Command line reference</a></li> <li><a href="#examples">A few examples</a></li> @@ -37,25 +43,137 @@ <hr /> +<h1><a name="intro">Introduction</a></h1> + +<p>cvs2svn is a program that can be used to migrate a CVS repository +to <a href="http://subversion.tigris.org/">Subversion</a> (otherwise +known as "SVN"). The primary goal of cvs2svn is to migrate as much +information as possible from your old CVS repository to your new +Subversion repository.</p> + +<p>But unfortunately, CVS doesn't record complete information about +your project's history. For example, CVS doesn't record what file +modifications took place within the same CVS commit. Therefore, +cvs2svn attempts to infer from CVS's incomplete information what +<em>really</em> happened in the history of your repository. So the +second goal of cvs2svn is to reconstruct as much of your CVS +repository's history as possible.</p> + +<h2>cvs2svn features</h2> + +<dl> + + <dt>No information lost</dt> + <dd>cvs2svn works hard to avoid losing any information from your CVS + repository (unless you specifically ask for a partial conversion + using <tt>--trunk-only</tt> or <tt>--exclude</tt>).</dd> + + <dt>Changesets</dt> + <dd>CVS records modifications file-by-file, and does not keep track + of what files were modified at the same time. cvs2svn uses + information like the file modification times, log messages, and + dependency information to reconstruct the original + changesets.</dd> + + <dt>Branch vs. tag</dt> + <dd>CVS allows the same symbol name to be used sometimes as a + branch, sometimes as a tag. cvs2svn has heuristics to decide how + to convert such "mixed" symbols (<tt>--force-branch</tt>, + <tt>--force-tag</tt>, <tt>--symbol-default</tt>).</dd> + + <dt>Branch and tag parents</dt> + <dd>In many cases, the CVS history is ambiguous about which branch + served as the parent of another branch or tag. cvs2svn determines + the most plausible parent for symbols using cross-file + information.</dd> + + <dt>Branch and tag creation times</dt> + <dd>CVS does not record when branches and tags are created. cvs2svn + creates branches and tags at a reasonable time, consistent with + the file revisions that were tagged, and tries to create each one + within a single Subversion commit if possible.</dd> + + <dt>Mime types</dt> + <dd>CVS does not record files' mime types. cvs2svn provides several + mechanisms for choosing reasonable file mime types + (<tt>--mime-types</tt>, <tt>--auto-props</tt>).</dd> + + <dt>Binary vs. text</dt> + <dd>Many CVS repositories do not systematically record which files + are binary and which are text. (This is mostly important if the + repository is used on non-Unix systems.) cvs2svn provides a + number of ways to infer this information + (<tt>--eol-from-mime-type</tt>, <tt>--default-eol</tt>, + <tt>--keywords-off</tt>, <tt>--auto-props</tt>).</dd> + + <dt>Subversion file properties</dt> + <dd>Subversion allows arbitrary text properties to be attached to + files. cvs2svn provides a mechanism to set such properties when a + file is first added to the repository + (<tt>--auto-props</tt>).</dd> + + <dt>Timestamp error correction</dt> + <dd>Many CVS repositories contain timestamp errors due to servers' + clocks being set incorrectly during part of the repository's + history. cvs2svn's history reconstruction is relatively robust + against timestamp errors and it writes monotonic timestamps to the + Subversion repository.</dd> + + <dt>Subversion repository customization</dt> + <dd>cvs2svn provides many options that allow you to customize the + structure of the resulting Subversion repository + (<tt>--trunk</tt>, <tt>--branches</tt>, <tt>--tags</tt>, + <tt>--no-prune</tt>, <tt>--symbol-transform</tt>, etc.; see also + the additional customization options available by using the <a + href="#options-file-method"><tt>--options</tt>-file + method</a>).</dd> + + <dt>Support for multiple character encodings</dt> + <dd>CVS does not record which character encoding was used to store + metainformation like file names, author names and log messages. + cvs2svn provides options to convert this text into UTF-8 as used + by Subversion (<tt>--encoding</tt>, + <tt>--fallback-encoding</tt>).</dd> + + <dt>Scalable</dt> + <dd>cvs2svn stores most intermediate data to on-disk databases so + that it can convert very large CVS repositories using a reasonable + amount of RAM. Conversions are organized as multiple passes and + can be restarted at an arbitrary pass in the case of + problems.</dd> + +</dl> + + +<hr /> + <h1><a name="reqs">Requirements</a></h1> <p>cvs2svn requires the following:</p> <ul> + <li>Direct (filesystem) access to a copy of the CVS repository that + you want to convert. cvs2svn parses the files in the CVS + repository directly, so it is not enough to have remote CVS + access. See the <a href="faq.html#repoaccess">FAQ</a> for more + information. + </li> <li>Python 2.2 or greater. See <a href="http://www.python.org/">http://www.python.org/.</a> </li> - <li>One or both of RCS `co' or CVS. The RCS home page is + <li>If you use the <tt>--use-rcs</tt> option, then RCS's `co' + program is required. The RCS home page is <a href="http://www.cs.purdue.edu/homes/trinkle/RCS/" - >http://www.cs.purdue.edu/homes/trinkle/RCS/</a>, - and the CVS home page is + >http://www.cs.purdue.edu/homes/trinkle/RCS/</a>. + See the <a href="#use-rcs"><tt>--use-rcs</tt> flag</a> for more + details. + </li> + <li>If you use the <tt>--use-cvs</tt> option, then the `cvs' command + is required. The CVS home page is <a href="http://ccvs.cvshome.org/">http://ccvs.cvshome.org/</a>. - - By default, cvs2svn only depends on RCS 'co', and we recommend this - as it is faster. Most likely you will never encounter the edge - cases that would make it necessary to use CVS instead. See the <a - href="#use-cvs">--use-cvs flag</a> for more details. + See the <a href="#use-cvs"><tt>--use-cvs</tt> flag</a> for more + details. </li> <li>GNU sort, which is part of the coreutils package, see <a href="http://www.gnu.org/software/coreutils/" @@ -269,16 +387,70 @@ any changes to your CVS repository, but around and deleting things in a CVS repository, it's all too easy to shoot yourself in the foot.</p> +<h2>End-of-line translation</h2> + +<p>One of the most important topics to consider when converting a +repository is the distinction between binary and text files. If you +accidentally treat a binary file as text <strong>your repository +contents will be corrupted</strong>.</p> + +<p>Text files are handled differently than binary files by both CVS +and Subversion. When a text file is checked out, the character used +to denote the end of line ("EOL") is converted to the local computer's +format. This is usually the most convenient behavior for text files. +Moreover, both CVS and Subversion allow "keywords" in text files (such +as <tt>$Id$</tt>), which are expanded with version control information +when the file is checked out. However, if line-end translation or +keyword expansion is applied to a binary file, the file will usually +be corrupted.</p> + +<p>For this reason, CVS treats a file as binary unless you +specifically tell it that the file is text. You can tell CVS that a +file is binary by using the command <tt>cvs admin -kb +<i>filename</i></tt>. But often CVS users forget to specify which +files are binary, and as long as the repository is only used under +Unix, they may never notice a problem, because the internal format of +CVS is the same as the Unix format. But Subversion is not as +forgiving as CVS if you tell it to treat a binary file as text.</p> + +<p>If you have been conscientious about marking files as binary in +CVS, then you should be able to use <tt>--default-eol=native</tt>. If +you have been sloppy, then you have a few choices:</p> +<ul> + <li>Convert your repository with cvs2svn's default options. Your + text files will be treated as binary, but that usually isn't very + harmful (at least no information will be lost).</li> + + <li>Mend your slovenly ways by fixing your CVS repository + <em>before</em> conversion: run <tt>cvs admin -kb + <i>filename</i></tt> for each binary file in the repository. Then + you can use <tt>--default-eol=native</tt> along with the + anal-retentive folks.</li> + + <li>Use cvs2svn options to help cvs2svn deduce which files are + binary <em>during</em> the conversion. The useful options are + <tt>--eol-from-mime-type</tt>, <tt>--keywords-off</tt>, + <tt>--auto-props</tt>, and <tt>--default-eol</tt>. See the <a + href="faq.html#eol-fixup">FAQ</a> for more information.</li> + +</ul> + +<h2>Converting part of repository</h2> + <p>If you want to convert a subdirectory in your repository, you can -just point cvs2svn at the subdirectory and go.</p> +just point cvs2svn at the subdirectory and go. There is no need to +delete the unwanted directories from the CVS repository.</p> -<p>If there are any files that you <i>don't</i> want converted into -your new Subversion repository, you should delete them or move them -aside.</p> - -Lastly, even though you can move and copy files and directories around -in Subversion, you may want to do some rearranging (typically of -directories) before running your conversion. +<p>If the subdirectory that you are converting contains any files that +you <i>don't</i> want converted into your new Subversion repository, +you should delete them or move them aside. Such files can be deleted +from HEAD after the conversion, but they will still be visible in the +repository history.</p> + +<p>Lastly, even though you can move and copy files and directories +around in Subversion, you may want to do some rearranging of project +directories before running your conversion to get the desired +repository project organization.</p> <hr /> @@ -327,51 +499,162 @@ this:</p> <p>Only the following options are allowed in combination with <tt>--options</tt>: <tt>-h/--help</tt>, <tt>--help-passes</tt>, <tt>--version</tt>, <tt>-v/--verbose</tt>, <tt>-q/--quiet</tt>, -<tt>-p</tt>, <tt>--dry-run</tt>, and <tt>--profile</tt>.</p> +<tt>-p/--pass/--passes</tt>, <tt>--dry-run</tt>, and +<tt>--profile</tt>.</p> <hr /> -<h1><a name="cmd-ref">Command line reference</a></h1> +<h1><a name="symbols">Symbol handling</a></h1> +<p>cvs2svn converts CVS tags and branches into Subversion tags and +branches following the <a +href="http://svnbook.red-bean.com/en/1.2/svn.branchmerge.using.html">standard +Subversion convention</a>. This section discusses issues related to +symbol handling.</p> + +<p><strong>HINT:</strong> If there are problems with symbol usage in +your repository, they are usually reported during +<tt>CollateSymbolsPass</tt> (pass 2) of the conversion, causing the +conversion to be interrupted. However, it is not necessary to restart +the whole conversion to fix the problems. Usually it is adequate to +adjust the symbol-handling options then re-start cvs2svn starting at +<tt>CollateSymbolsPass</tt>, by adding the option "<tt>-p +CollateSymbolsPass:</tt>". This trick can save a lot of time if you +have a large repository, as it might take a few iterations before you +find the best set of options to convert your repository.</p> + + +<h2><a name="symbol-exclusion">Excluding tags and branches</a></h2> + +<p>Often a CVS repository contains tags and branches that will not be +needed after the conversion to Subversion. You can instruct cvs2svn +to exclude such symbols from the conversion, in which case they will +not be present in the resulting Subversion repository. Please be +careful when doing this; excluding symbols causes information that was +present in CVS to be omitted in Subversion, thereby discarding +potentially useful historical information. Also be aware that if you +exclude a branch, then all CVS revisions that were committed to that +branch will also be excluded.</p> + +<p>To exclude a tag or branch, use the option +<tt>--exclude=SYMBOL</tt>. You can also exclude a whole group of +symbols matching a specified regular expression; for example, +<tt>--exclude='RELEASE_0_.*'</tt>. (The regular expression has to +match the <em>whole</em> symbol name for the rule to apply.)</p> + +<p>However, please note the following restriction. If a branch has a +subbranch or a tag on it, then the branch cannot be excluded unless +the dependent symbol is also excluded. cvs2svn checks for this +situation; if it occurs then <tt>CollateSymbolsPass</tt> outputs an +error message like the following:</p> -<table border="1" cellpadding="10" cellspacing="3" width="80%"> +<pre> + ERROR: The branch 'BRANCH' cannot be excluded because the following symbols depend on it: + 'TAG' + 'SUBBRANCH' +</pre> - <tr> - <td colspan="2"> - USAGE:<br/> - <tt>cvs2svn [-s svn-repos-path|--dumpfile=PATH] cvs-repos-path</tt><br/> - <tt>cvs2svn --options=PATH</tt><br/> - </td> - </tr> +<p>In such a case you can either exclude the dependent symbol(s) (in +this case by using <tt>--exclude=TAG --exclude=SUBBRANCH</tt>) or +<em>not</em> exclude 'BRANCH'.</p> + + +<h2><a name="symbol-inconsistencies">Tag/branch inconsistencies</a></h2> + +<p>In CVS, the same symbol can appear as a tag in some files (e.g., +<tt>cvs tag SYMBOL file1.txt</tt>) and a branch in others (e.g., +<tt>cvs tag -b SYMBOL file2.txt</tt>). Subversion takes a more global +view of your repository, and therefore works better when each symbol +is used in a self-consistent way--either always as a branch or always +as a tag. cvs2svn provides features to help you resolve these +ambiguities.</p> + +<p>If your repository contains inconsistently-used symbols, then +<tt>CollateSymbolsPass</tt> reports the problems and aborts the +conversion. The error message looks like this:</p> - <tr> - <td align="right"><tt>--help</tt>, <tt>-h</tt></td> - <td>Print the usage message and exit with success.</td> - </tr> +<pre> + ERROR: It is not clear how the following symbols should be converted. + Use --force-tag, --force-branch, --exclude, and/or --symbol-default to + resolve the ambiguity. + 'SYMBOL1' is a tag in 1 files, a branch in 2 files and has commits in 0 files + 'SYMBOL2' is a tag in 2 files, a branch in 1 files and has commits in 0 files + 'SYMBOL3' is a tag in 1 files, a branch in 2 files and has commits in 1 files +</pre> - <tr> - <td align="right"><tt>--help-passes</tt></td> - <td>Print the numbers and names of the conversion passes and exit - with success.</td> - </tr> +<p>You have to tell cvs2svn how to fix the inconsistencies then +restart the conversion.</p> + +<p>There are three ways to deal with an inconsistent symbol: treat it +as a tag, treat it as a branch, or exclude it from the conversion +altogether.</p> + +<p>In the example above, the symbol 'SYMBOL1' was used as a branch in +two files but used as a tag in only one file. Therefore, it might +make sense to convert it as a branch, by using the option +<tt>--force-branch=SYMBOL1</tt>. However, no revisions were committed +on this branch, so it would also be possible to convert it as a tag, +by using the option <tt>--force-tag=SYMBOL1</tt>. If the symbol is +not needed at all, it can be excluded by using +<tt>--exclude=SYMBOL1</tt>.</p> + +<p>Similarly, 'SYMBOL2' was used more often as a tag, but can still be +converted as a branch or a tag, or excluded.</p> + +<p><tt>SYMBOL3</tt>, on the other hand, was sometimes used as a +branch, and at least one revision was committed on the branch. It can +be converted as a branch, using <tt>--force-branch=SYMBOL3</tt>. But +it cannot be converted as a tag (because tags are not allowed to have +revisions on them). If it is excluded, using +<tt>--exclude=SYMBOL3</tt>, then both the branch and the revisions on +the branch will be left out of the Subversion repository.</p> + +<p>If you are not so picky about which symbols are converted as tags +and which as branches, you can ask cvs2svn to decide by itself. To do +this, specify the <tt>--symbol-default=OPTION</tt>, where +<tt>OPTION</tt> can be either "<tt>branch</tt>" (treat every ambiguous +symbol as a branch), "<tt>tag</tt>" (treat every ambiguous symbol as a +tag), or "<tt>heuristic</tt>" (decide how to treat each ambiguous +symbol based on whether it was used more often as a branch or as a tag +in CVS). You can use the <tt>--force-branch</tt> and +<tt>--force-tag</tt> options to specify the treatment of particular +symbols, in combination with <tt>--symbol-default</tt> to specify the +default to be used for other ambiguous symbols.</p> + +<hr /> + + +<h1><a name="cmd-ref">Command line reference</a></h1> + +<table border="1" cellpadding="10" cellspacing="3" width="80%"> <tr> - <td align="right"><tt>--version</tt></td> - <td>Print the version number.</td> + <td colspan="2"> + <strong>USAGE:</strong><br/> + <tt>cvs2svn [OPTIONS]... [-s SVN-REPOS-PATH|--dumpfile=PATH|--dry-run] + CVS-REPOS-PATH</tt><br/> + <tt>cvs2svn [OPTIONS]... --options=PATH</tt><br/> + </td> </tr> <tr> - <td align="right"><tt>--verbose</tt>, <tt>-v</tt></td> - <td>Tell cvs2svn to print <i>tons</i> of information about what - it's doing to STDOUT.</td> + <td align="right"><tt>CVS-REPOS-PATH</tt></td> + <td>The filesystem path of the part of the CVS repository that you + want to convert. It is not possible to convert a CVS repository + to which you only have remote access; see <a + href="faq.html#repoaccess">the FAQ</a> for details. This + doesn't have to be the top level directory of a CVS repository; + it can point at a project within a repository, in which case + only that project will be converted. This path or one of its + parent directories has to contain a subdirectory called CVSROOT + (though the CVSROOT directory can be empty).</td> </tr> <tr> - <td align="right"><tt>--quiet</tt>, <tt>-q</tt></td> - <td>Tell cvs2svn to operate in quiet mode, printing little more - than pass starts and stops to STDOUT. This option may be - specified twice to suppress all non-error output.</td> + <th colspan="2"> + Configuration via options file + </th> </tr> <tr> @@ -382,30 +665,34 @@ this:</p> </tr> <tr> - <td align="right"><tt>-s PATH</tt></td> + <th colspan="2"> + Output options + </th> + </tr> + + <tr> + <td align="right"><tt>-s PATH</tt><br/><tt>--svnrepos PATH</tt></td> <td>Load CVS repository into the Subversion repository located at PATH. If there is no Subversion repository at PATH, create a new one.</td> </tr> <tr> - <td align="right"><tt>-p PASS</tt></td> - <td>Execute only pass PASS of the conversion. PASS can be - specified by name or by number (see <tt>--help-passes</tt>)</td> + <td align="right"><tt>--existing-svnrepos</tt></td> + <td>Load the converted CVS repository into an existing Subversion + repository, instead of creating a new repository.</td> </tr> <tr> - <td align="right"><tt>-p [START]:[END]</tt></td> - <td>Execute passes START through END of the conversion - (inclusive). START and END can be specified by name or by number - (see <tt>--help-passes</tt>). If START or END is missing, it - defaults to the first or last pass, respectively.</td> + <td align="right"><tt>--fs-type=TYPE</tt></td> + <td>Pass the <tt>--fs-type=TYPE</tt> option to "svnadmin + create" when creating a new repository.</td> </tr> <tr> - <td align="right"><tt>--existing-svnrepos</tt></td> - <td>Load the converted CVS repository into an existing Subversion - repository, instead of creating a new repository.</td> + <td align="right"><tt>--bdb-txn-nosync</tt></td> + <td>Pass the <tt>--bdb-txn-nosync</tt> switch to "svnadmin + create" when creating a new repository.</td> </tr> <tr> @@ -424,28 +711,9 @@ this:</p> </tr> <tr> - <td align="right"><a name="use-cvs"><tt>--use-cvs</tt></a></td> - <td>If RCS <b><tt>co</tt></b> is having trouble extracting certain - revisions, you may need to pass this flag, which causes cvs2svn - to use CVS instead of RCS to read the repository. RCS is much - faster, so it's the default, but in certain rare cases it has - problems with data that CVS doesn't have problems with. - Specifically: - <ul> - <li>RCS can't handle spaces in author names:<br/> - <a href="http://cvs2svn.tigris.org/issues/show_bug.cgi?id=4" - >http://cvs2svn.tigris.org/issues/show_bug.cgi?id=4</a> - </li> - <li>"Unterminated keyword" misread by RCS:<br/> - <a href="http://cvs2svn.tigris.org/issues/show_bug.cgi?id=11" - >http://cvs2svn.tigris.org/issues/show_bug.cgi?id=11</a> - </li> - <li>RCS handles the "$Log$" keyword differently from CVS:<br/> - <a href="http://cvs2svn.tigris.org/issues/show_bug.cgi?id=29" - >http://cvs2svn.tigris.org/issues/show_bug.cgi?id=29</a> - </li> - </ul> - </td> + <th colspan="2"> + Conversion options + </th> </tr> <tr> @@ -483,9 +751,14 @@ this:</p> <tr> <td align="right"><tt>--encoding=ENC</tt></td> <td>Use ENC as the encoding for filenames, log messages, and - author names in the CVS repos. This option may be specified - multiple times, in which case the encodings are tried in order - until one succeeds. Default: ascii.</td> + author names in the CVS repos. (By using an <tt>--options</tt> + file, it is possible to specify one set of encodings to use for + filenames and a second set for log messages and author names.) + This option may be specified multiple times, in which case the + encodings are tried in order until one succeeds. Default: + ascii. Other possible values include the <a + href="http://docs.python.org/lib/standard-encodings.html">standard + Python encodings</a>.</td> </tr> <tr> @@ -493,8 +766,42 @@ this:</p> <td>If all of the encodings specified with <tt>--encoding</tt> fail, then fall back to using ENC in lossy 'replace' mode. Use of this option may cause information to be lost, but at least it - allows the conversion to run to completion. Default: - disabled.</td> + allows the conversion to run to completion. This option only + affects the encoding of log messages and author names; there is + no fallback encoding for filenames. (By using an <tt>--options</tt> + file, it is possible to specify a fallback encoding for + filenames.) Default: disabled.</td> + </tr> + + <tr> + <td align="right"><tt>--symbol-transform=PAT:SUB</tt></td> + <td><p>Transform RCS/CVS symbol names before entering them into + Subversion. PAT is a Python regular expression pattern that is + matched against the entire symbol name. If it matches, the + symbol is replaced with SUB, which is a replacement pattern + using Python's reference syntax. You may specify any number of + these options; they will be applied in the order given on the + command line.</p> + + <p>This option can be useful if you're converting a repository in + which the developer used directory-wide symbol names like 1_0, 1_1 + and 2_1 as a kludgy form of release tagging (the C-x v s command + in Emacs VC mode encourages this practice). A command like</p> + +<pre> +cvs2svn --symbol-transform='([0-9])-(.*):release-\1.\2' -s SVN RCS +</pre> + + <p>will transform a local CVS repository into a local SVN repository, + performing the following sort of mappings of RCS symbolic names to + SVN tags:</p> + +<pre> +1-0 → release-1.0 +1-1 → release-1.1 +2-0 → release-2.0 +</pre> + </td> </tr> <tr> @@ -539,32 +846,17 @@ this:</p> </tr> <tr> - <td align="right"><tt>--symbol-transform=PAT:SUB</tt></td> - <td><p>Transform RCS/CVS symbol names before entering them into - Subversion. PAT is a Python regexp pattern and SUB is a - replacement pattern using Python's reference syntax. You may - specify any number of these options; they will be - applied in the order given on the command line.</p> - - <p>This option can be useful if you're converting a repository in - which the developer used directory-wide symbol names like 1_0, 1_1 - and 2_1 as a kludgy form of release tagging (the C-x v s command - in Emacs VC mode encourages this practice). A command like</p> - -<pre> -cvs2svn --symbol-transform='([0-9])-(.*):release-\1.\2' -q -s SVN RCS -</pre> - - <p>will transform a local CVS repository into a local SVN repository, - performing the following sort of mappings of RCS symbolic names to - SVN tags:</p> + <td align="right"><tt>--no-cross-branch-commits</tt></td> + <td>Prevent the creation of SVN commits that affect multiple + branches or trunk and a branch. Instead, break such changesets + into multiple commits, one per branch.</td> + </tr> -<pre> -1-0 → release-1.0 -1-1 → release-1.1 -2-0 → release-2.0 -</pre> - </td> + <tr> + <td align="right"><tt>--retain-conflicting-attic-files</tt></td> + <td>If a file appears both inside an outside of the CVS attic, + retain the attic version in an SVN subdirectory called `Attic'. + (Normally this situation is treated as a fatal error.)</td> </tr> <tr> @@ -574,18 +866,6 @@ cvs2svn --symbol-transform='([0-9])-(.*) </tr> <tr> - <td align="right"><tt>--fs-type=TYPE</tt></td> - <td>Pass the <tt>--fs-type=TYPE</tt> option to "svnadmin - create" when creating a new repository.</td> - </tr> - - <tr> - <td align="right"><tt>--bdb-txn-nosync</tt></td> - <td>Pass the <tt>--bdb-txn-nosync</tt> switch to "svnadmin - create" when creating a new repository.</td> - </tr> - - <tr> <td align="right"><tt>--cvs-revnums</tt></td> <td>Record CVS revision numbers as file properties in the Subversion repository. (Note that unless it is removed @@ -602,23 +882,6 @@ cvs2svn --symbol-transform='([0-9])-(.*) </tr> <tr> - <td align="right"><tt>--auto-props=FILE</tt></td> - <td>Specify a file in the format of Subversion's config file, - whose <tt>[auto-props]</tt> section can be used to set arbitrary - properties on files in the Subversion repository based on their - filenames. (The <tt>[auto-props]</tt> section header must be - present; other sections of the config file, including the - <tt>enable-auto-props</tt> setting, are ignored.) Filenames are - matched to the filename patterns case-sensitively unless the - <tt>--auto-props-ignore-case</tt> option is specified.</td> - </tr> - - <tr> - <td align="right"><tt>--auto-props-ignore-case</tt></td> - <td>Ignore case when pattern-matching auto-props patterns.</td> - </tr> - - <tr> <td align="right"><tt>--eol-from-mime-type</tt></td> <td>For files that don't have the <tt>kb</tt> expansion mode but have a known mime type, set the eol-style based on the mime @@ -631,12 +894,23 @@ cvs2svn --symbol-transform='([0-9])-(.*) </tr> <tr> - <td align="right"><tt>--no-default-eol</tt></td> - <td>Files that don't have the <tt>kb</tt> expansion mode and (if - <tt>--eol-from-mime-type</tt> is set) unknown mime type usually - have their <tt>svn:eol-style</tt> property to "native". If this - option is specified, such files are left with no eol-style - (i.e., no EOL translation).</td> + <td align="right"><tt>--auto-props=FILE</tt></td> + <td>Specify a file in the format of Subversion's config file, + whose <tt>[auto-props]</tt> section can be used to set arbitrary + properties on files in the Subversion repository based on their + filenames. (The <tt>[auto-props]</tt> section header must be + present; other sections of the config file, including the + <tt>enable-auto-props</tt> setting, are ignored.) Filenames are + matched to the filename patterns case-insensitively.</td> + </tr> + + <tr> + <td align="right"><tt>--default-eol=STYLE</tt></td> + <td>Set <tt>svn:eol-style</tt> to STYLE for files that don't have + the <tt>kb</tt> expansion mode and whose end-of-line translation + mode hasn't been determined by one of the other options. STYLE + can be "<tt>binary</tt>" (default), "<tt>native</tt>", + "<tt>CRLF</tt>", "<tt>LF</tt>", or "<tt>CR</tt>".</td> </tr> <tr> @@ -651,49 +925,171 @@ cvs2svn --symbol-transform='([0-9])-(.*) </tr> <tr> - <td align="right"><tt>--tmpdir=PATH</tt></td> - <td>Use the directory PATH for all of cvs2svn's temporary data - (which can be a <i>lot</i> of data). The default value is the - current working directory.</td> + <th colspan="2"> + Extraction options + </th> </tr> <tr> - <td align="right"><tt>--skip-cleanup</tt></td> - <td>Prevent the deletion of the temporary files that cvs2svn - creates in the process of conversion.</td> + <td align="right"> + <a name="use-internal-co"><tt>--use-internal-co</tt></a> + </td> + <td>Use internal code to extract the contents of CVS revisions. + This is the default extraction option. This is up to 50% faster + than <tt>--use-rcs</tt>, but needs a lot of disk space: roughly + the size of your CVS repository plus the peak size of a complete + checkout of the repository with all branches that existed and + still had commits pending at a given time. If this option is + used, the <tt>$Log$</tt> keyword is not handled. + </td> </tr> <tr> - <td align="right"><tt>--profile</tt></td> - <td>Dump Python <a href="http://docs.python.org/lib/module-hotshot.html" - >Hotshot</a> profiling data to the file <tt>cvs2svn.hotshot</tt>.</td> + <td align="right"><a name="use-rcs"><tt>--use-rcs</tt></a></td> + <td>Use RCS's <b><tt>co</tt></b> command to extract the contents + of CVS revisions. RCS is much faster than CVS, but in certain + rare cases it has problems with data that CVS can handle. + Specifically: + <ul> + <li>RCS can't handle spaces in author names:<br/> + <a href="http://cvs2svn.tigris.org/issues/show_bug.cgi?id=4" + >http://cvs2svn.tigris.org/issues/show_bug.cgi?id=4</a> + </li> + <li>"Unterminated keyword" misread by RCS:<br/> + <a href="http://cvs2svn.tigris.org/issues/show_bug.cgi?id=11" + >http://cvs2svn.tigris.org/issues/show_bug.cgi?id=11</a> + </li> + <li>RCS handles the "$Log$" keyword differently from CVS:<br/> + <a href="http://cvs2svn.tigris.org/issues/show_bug.cgi?id=29" + >http://cvs2svn.tigris.org/issues/show_bug.cgi?id=29</a> + </li> + </ul> + If you are having trouble in <tt>OutputPass</tt> of a + conversion when using the <tt>--use-rcs</tt> option, the first + thing to try is using the <tt>--use-cvs</tt> option instead. + </td> + </tr> + + <tr> + <td align="right"><a name="use-cvs"><tt>--use-cvs</tt></a></td> + <td>If RCS <b><tt>co</tt></b> is having trouble extracting CVS + revisions, you may need to pass this flag, which causes cvs2svn + to use CVS instead of RCS to read the repository. See <a + href="#use-rcs"><tt>--use-rcs</tt></a> for more information. + </td> + </tr> + + <tr> + <th colspan="2"> + Environment options + </th> + </tr> + + <tr> + <td align="right"><tt>--tmpdir=PATH</tt></td> + <td>Use the directory PATH for all of cvs2svn's temporary data + (which can be a <i>lot</i> of data). The default value is + <tt>cvs2svn-tmp</tt> in the current working directory.</td> </tr> <tr> <td align="right"><tt>--svnadmin=PATH</tt></td> - <td>If the svnadmin program is not in your $PATH you should - specify its absolute path with this switch.</td> + <td>If the <tt>svnadmin</tt> program is not in your $PATH you + should specify its absolute path with this switch. + <tt>svnadmin</tt> is needed when the <tt>-s/--svnrepos</tt> + output option is used</td> </tr> <tr> <td align="right"><tt>--co=PATH</tt></td> - <td>If the co program (a part of RCS) is not in your $PATH you should - specify its absolute path with this switch. (co is needed if - <tt>--use-cvs</tt> is not specified.)</td> + <td>If the <tt>co</tt> program (a part of RCS) is not in your + $PATH you should specify its absolute path with this switch. + (<tt>co</tt> is needed if the <tt>--use-rcs</tt> extraction + option is used.)</td> </tr> <tr> <td align="right"><tt>--cvs=PATH</tt></td> <td>If the cvs program is not in your $PATH you should - specify its absolute path with this switch. (cvs is needed if - <tt>--use-cvs</tt> is specified.)</td> + specify its absolute path with this switch. (<tt>cvs</tt> is + needed if the <tt>--use-cvs</tt> extraction option is + used.)</td> </tr> <tr> <td align="right"><tt>--sort=PATH</tt></td> - <td>If the GNU sort program is not in your $PATH you should - specify its absolute path with this switch. cvs2svn requires - GNU sort; Windows <tt>sort.exe</tt> is <b>not</b> adequate.</td> + <td>If the GNU <tt>sort</tt> program is not in your $PATH you + should specify its absolute path with this switch. cvs2svn + requires GNU <tt>sort</tt>; Windows <tt>sort.exe</tt> is + <b>not</b> adequate.</td> + </tr> + + <tr> + <th colspan="2"> + Partial conversions + </th> + </tr> + + <tr> + <td align="right"><tt>-p PASS</tt><br/><tt>--pass PASS</tt></td> + <td>Execute only pass PASS of the conversion. PASS can be + specified by name or by number (see <tt>--help-passes</tt>)</td> + </tr> + + <tr> + <td align="right"><tt>-p [START]:[END]</tt><br/><tt>--passes [START]:[END]</tt></td> + <td>Execute passes START through END of the conversion + (inclusive). START and END can be specified by name or by number + (see <tt>--help-passes</tt>). If START or END is missing, it + defaults to the first or last pass, respectively.</td> + </tr> + + <tr> + <th colspan="2"> + Information options + </th> + </tr> + + <tr> + <td align="right"><tt>--version</tt></td> + <td>Print the version number.</td> + </tr> + + <tr> + <td align="right"><tt>--help</tt>, <tt>-h</tt></td> + <td>Print the usage message and exit with success.</td> + </tr> + + <tr> + <td align="right"><tt>--help-passes</tt></td> + <td>Print the numbers and names of the conversion passes and exit + with success.</td> + </tr> + + <tr> + <td align="right"><tt>--verbose</tt>, <tt>-v</tt></td> + <td>Tell cvs2svn to print lots of information about what + it's doing to STDOUT. This option can be specified twice to get + debug-level output.</td> + </tr> + + <tr> + <td align="right"><tt>--quiet</tt>, <tt>-q</tt></td> + <td>Tell cvs2svn to operate in quiet mode, printing little more + than pass starts and stops to STDOUT. This option may be + specified twice to suppress all non-error output.</td> + </tr> + + <tr> + <td align="right"><tt>--skip-cleanup</tt></td> + <td>Prevent the deletion of the temporary files that cvs2svn + creates in the process of conversion.</td> + </tr> + + <tr> + <td align="right"><tt>--profile</tt></td> + <td>Dump Python <a href="http://docs.python.org/lib/module-hotshot.html" + >Hotshot</a> profiling data to the file <tt>cvs2svn.hotshot</tt>.</td> </tr> </table> @@ -706,14 +1102,14 @@ cvs2svn --symbol-transform='([0-9])-(.*) repository, run the script like this:</p> <pre> - $ cvs2svn -s NEW_SVNREPOS CVSREPOS + $ cvs2svn --svnrepos NEW_SVNREPOS CVSREPOS </pre> <p>To create a new Subversion repository containing only trunk commits, and omitting all branches and tags from the CVS repository, do</p> <pre> - $ cvs2svn --trunk-only -s NEW_SVNREPOS CVSREPOS + $ cvs2svn --trunk-only --svnrepos NEW_SVNREPOS CVSREPOS </pre> <p>To create a Subversion dumpfile (suitable for 'svnadmin load') from @@ -730,12 +1126,13 @@ specify <tt>--options</tt>:</p> $ cvs2svn --options OPTIONSFILE </pre> -<p>As it works, cvs2svn will create many temporary files in the -current directory (or the directory specified with <tt>--tmpdir</tt>). -This is normal. If the entire conversion is successful, however, -those tempfiles will be automatically removed. If the conversion is -not successful, or if you specify the '--skip-cleanup' option, cvs2svn -will leave the temporary files behind for possible debugging.</p> +<p>As it works, cvs2svn will create many temporary files in a +temporary directory called "cvs2svn-tmp" (or the directory specified +with <tt>--tmpdir</tt>). This is normal. If the entire conversion is +successful, however, those tempfiles will be automatically removed. +If the conversion is not successful, or if you specify the +'--skip-cleanup' option, cvs2svn will leave the temporary files behind +for possible debugging.</p> </div> </body> diff -purNbBwx .svn cvs2svn-1.5.x/www/faq.html cvs2svn-2.0.0/www/faq.html --- cvs2svn-1.5.x/www/faq.html 2006-10-03 17:14:52.000000000 +0200 +++ cvs2svn-2.0.0/www/faq.html 2007-08-15 22:53:48.000000000 +0200 @@ -21,6 +21,9 @@ <ol> + <li><a href="#repoaccess">How can I convert a CVS repository to + which I only have remote access?</a></li> + <li><a href="#oneatatime">How can I convert my CVS repository one module at a time?</a></li> @@ -34,6 +37,9 @@ that <tt>trunk/tags/branches</tt> are inside of <tt>foo</tt>?</a></li> + <li><a href="#eol-fixup">How do I fix up end-of-line translation + problems?</a></li> + <li><a href="#path-symbol-transforms">I want a single project but tag-rewriting rules that vary by subdirectory. Can this be done?</a></li> @@ -54,6 +60,8 @@ command '['co', '-q', '-x,v', '-p1.1', '-kk', '/home/cvsroot/myfile,v']' failed" in pass 8.</a></li> + <li><a href="#gdbm-nfs">gdbm.error: (45, 'Operation not supported')</a></li> + </ol> <p><strong>Getting help:</strong></p> @@ -72,6 +80,40 @@ <h2>How-to:</h2> +<h3><a name="repoaccess" title="#repoaccess">How can I convert a CVS +repository to which I only have remote access?</a></h3> + +<p>cvs2svn requires direct, filesystem access to a copy of the CVS +repository that you want to convert. The reason for this requirement +is that cvs2svn directly parses the <tt>*,v</tt> files that make up +the CVS repository.</p> + +<p>Many remote hosting sites provide access to backups of your CVS +repository, which could be used for a cvs2svn conversion. For +example:</p> + +<ul> + <li><a href="http://sourceforge.net">SourceForge</a> allows CVS + content to be accessed via + <a href="http://sourceforge.net/docs/E04/en/#rsync">rsync</a>. In + fact, they provide <a + href="http://sourceforge.net/docman/display_doc.php?docid=31070&group_id=1#import">complete instructions</a> + for migrating a SourceForge project from CVS to SVN.</li> + <li>...<i>(other examples welcome)</i></li> +</ul> + +<p>If your provider does not provide any way to download your CVS +repository, there is a possible workaround. The <a +href="http://cvs.m17n.org/~akr/cvssuck/">CVSsuck</a> tool claims to be +able to mirror a remote CVS repository using only CVS commands. It +should be possible to use this tool to fetch a copy of your CVS +repository from your provider, then to use cvs2svn to convert the +copy. However, the developers of cvs2svn do not have any experience +with this tool, so you are on your own here. If you try this method, +please tell us about your experience on the <a +href="mailto:users@cvs2svn.tigris.org">users mailing list</a>.</p> + + <h3><a name="oneatatime" title="#oneatatime">How can I convert my CVS repository one module at a time?</a></h3> @@ -221,6 +263,128 @@ ctx.add_project( information.</p> +<h3><a name="eol-fixup" title="#eol-fixup">How do I fix up end-of-line + translation problems?</a></h3> + + <p>Warning: cvs2svn's handling of end-of-line options changed + between version 1.5.x and version 2.0.x. <strong>This + documentation applies to version 2.0.x and later.</strong> The + documentation applying to an earlier version can be found in the + <tt>www</tt> directory of that release of cvs2svn.</p> + + <p>Starting with version 2.0, the default behavior of cvs2svn is to + treat all files as binary except those explicitly determined to be + text. (Previous versions treated files as text unless they were + determined to be binary.) This behavior was changed because, + generally speaking, it is safer to treat a text file as binary + than vice versa.</p> + + <p>However, it is often preferred to set + <tt>svn:eol-style=native</tt> for text files, so that their + end-of-file format is converted to that of the client platform + when the file is checked out. This section describes how to + get the settings that you want.</p> + + <p>If a file is marked as binary in CVS (with <tt>cvs admin + -kb</tt>, then cvs2svn will always treat the file as binary. For + other files, cvs2svn has a number of options that can help choose + the correct end-of-line translation parameters during the + conversion:</p> + + <table border="1" cellpadding="10" cellspacing="3" width="80%"> + + <tr> + <td align="right"><tt>--auto-props=FILE</tt></td> + <td><p>Set arbitrary Subversion properties on files based on the + auto-props section of a file in svn config format. The + auto-props file might have content like this:</p> +<pre> +[auto-props] +*.txt = svn:mime-type=text/plain;svn:eol-style=native +*.doc = svn:mime-type=application/msword;svn:eol-style= +</pre> + <p>This option can also be used in combination with + <tt>--eol-from-mime-type</tt>.</p> + <p>Please note that cvs2svn treats auto-props directives a + bit differently than Subversion: if cvs2svn sees a setting + like <tt>svn:eol-style=</tt> (with no value), it forces the + property to remain <em>unset</em>, even if later rules would + otherwise set the property. Subversion, in this situation, + would try to set the property to the empty string.</p></td> + </tr> + + <tr> + <td align="right"><tt>--mime-types=FILE</tt></td> + <td><p>Specifies an Apache-style mime.types file for setting + files' <tt>svn:mime-type</tt> property based on the file + extension. The mime-types file might have content like + this:</p> +<pre> +text/plain txt +application/msword doc +</pre> + <p>This option only has an effect on <tt>svn:eol-style</tt> + if it is used in combination with + <tt>--eol-from-mime-type</tt>.</p></td> + </tr> + + <tr> + <td align="right"><tt>--eol-from-mime-type</tt></td> + <td>Set <tt>svn:eol-style</tt> based on the file's mime type + (if known). If the mime type starts with "<tt>text/</tt>", + then the file is treated as a text file; otherwise, it is + treated as binary. This option is useful in combination with + <tt>--auto-props</tt> or <tt>--mime-types</tt>.</td> + </tr> + + <tr> + <td align="right"><tt>--default-eol=STYLE</tt></td> + <td>Usually cvs2svn treats a file as binary unless one of the + other rules determines that it is binary or if it is marked + as binary in CVS. But if this option is specified, then + cvs2svn uses the specified style as the default. STYLE can + be 'binary' (default), 'native', 'CRLF', 'LF', or 'CR'. If + you have been diligent about annotating binary files in CVS, + or if you are confident that the above options will catch + all of your binary files, then this option may be just the + thing you need.</td> + </tr> + + </table> + + <p>If you don't use any of these options, then cvs2svn will not + arrange any line-end translation whatsoever. The file contents in + the SVN repository should be the same as the contents you would + get if checking out with CVS on the machine on which cvs2svn is + run. This also means that the EOL characters of text files will + be the same no matter where the SVN data are checked out (i.e., + not translated to the checkout machine's EOL format).</p> + + <p>To do a better job, you can use <tt>--auto-props</tt>, + <tt>--mime-types</tt>, and <tt>--eol-from-mime-type</tt> to + specify exactly which properties to set on each file based on its + filename.</p> + + <p>For total control over setting properties on files, you can use + the <a + href="cvs2svn.html#options-file-method"><tt>--options</tt>-file + method</a> and write your own <tt>SVNPropertySetter</tt> in + Python. For example,</p> +<pre> +from cvs2svn_lib.property_setters import SVNPropertySetter + +class MyPropertySetter(SVNPropertySetter): + def set_properties(self, s_item): + if s_item.cvs_rev.cvs_file.cvs_path.startswith('path/to/funny/files/'): + s_item.svn_props['svn:mime-type'] = 'text/plain' + s_item.svn_props['svn:eol-style'] = 'CRLF' + +ctx.svn_property_setters.append(MyPropertySetter()) +</pre> + <p>See the file <tt>cvs2svn_lib/property_setters.py</tt> for more + examples.</p> + + <h3><a name="path-symbol-transforms" title="#path-symbol-transforms">I want a single project but tag-rewriting rules that vary by subdirectory. Can this be done?</a></h3> @@ -273,7 +437,7 @@ for subdir in ['project1', 'project2', ' symbol_transforms.append( MySymbolTransform( subdir, - r'^release-(\d+)_(\d+)$', + r'release-(\d+)_(\d+)', r'%s-release-\1.\2' % subdir)) # Now register the project, using our own symbol transforms: @@ -358,6 +522,15 @@ a backup</b> before starting. Never run repository--always work on a copy of your repository.</p> <ol> + <li>Restart the conversion with the + <tt>--retain-conflicting-attic-files</tt> option. This causes the + non-attic and attic versions of the file to be converted + separately, with the <tt>Attic</tt> version stored to a new + subdirectory as <tt>path/Attic/file.txt</tt>. This approach + avoids losing any history, but by moving the <tt>Attic</tt> + version of the file to a different subdirectory it might cause + historical revisions to be broken.</li> + <li>Remove the <tt>Attic</tt> version of the file and restart the conversion. Sometimes it represents an old version of the file that was deleted long ago, and it won't be missed. But this @@ -383,17 +556,6 @@ repository--always work on a copy of you $ rm repo/path/file.txt,v </pre></li> - <li>Rename the <tt>Attic</tt> version of the file and restart the - conversion. This avoids losing history, but it changes the name - of the <tt>Attic</tt> version of the file to - <tt>file-from-Attic.txt</tt> whenever it appeared, and might - thereby cause revisions to be broken. - - <pre> - # You did make a backup, right? - $ mv repo/path/Attic/file.txt,v repo/path/Attic/file-from-Attic.txt,v - </pre></li> - <li>Rename the non-<tt>Attic</tt> version of the file and restart the conversion. This avoids losing history, but it changes the name of the non-<tt>Attic</tt> version of the file to @@ -429,6 +591,16 @@ RCS, or to ensure that CVS is installed <a href="cvs2svn.html#use-cvs"><tt>--use-cvs</tt></a> option. </p> +<h3><a name="gdbm-nfs" title="#gdbm-nfs">gdbm.error: (45, 'Operation + not supported')</a></h3> + +<p>This has been reported to be caused by trying to create gdbm +databases on an NFS partition. Apparently gdbm does not support +databases on NFS partitions. The workaround is to use the +<tt>--tmpdir</tt> option to choose a local partition for cvs2svn to +write its temporary files.</p> + + <h2>Getting help:</h2> <h3><a name="gettinghelp" title="#gettinghelp">How do I get diff -purNbBwx .svn cvs2svn-1.5.x/www/project_license.html cvs2svn-2.0.0/www/project_license.html --- cvs2svn-1.5.x/www/project_license.html 2006-05-17 17:43:34.000000000 +0200 +++ cvs2svn-2.0.0/www/project_license.html 2007-08-15 22:53:48.000000000 +0200 @@ -34,7 +34,7 @@ below.</p> <pre> /* ================================================================ - * Copyright (c) 2000-2004 CollabNet. All rights reserved. + * Copyright (c) 2000-2007 CollabNet. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are