Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extremely slow performance on large binary files with many revisions #11

Open
GoogleCodeExporter opened this issue Apr 16, 2015 · 3 comments

Comments

@GoogleCodeExporter
Copy link
Contributor

What steps will reproduce the problem?
1. Create a VSS database
2. Add a binary file of 10MB in it
3. Create 300 revisions of the file in VSS using check out, change the file a 
bit, check in.
4. Convert this database using vss2git

What is the expected output? What do you see instead?

The output of vss2git is correct as far as I can see. However the process takes 
extremely long (weeks). Scanning the database is fast (<1 minute), but each 
revision takes in the order of hours.

What version of the product are you using? On what operating system?

1.0.10.0, Windows 8.1

Please provide any additional information below.

This is a special case, because usually a version control database will contain 
many smaller (textual) files with a small amount of changes. In this case the 
VSS database I work with contains MS Access databases (MDB files) which can 
only be saved monolithically. Hence the large set of revisions for a large 
binary file.

I debugged the process in Visual Studio 2013. The long time for processing each 
revision is spent, on a high level, like this:

* Process a revision for MyBinaryFile.bin, for example revision 1
  *  VssFileRevision.GetContents() - for revision 1
    * Get the last revision for MyBinaryFile.bin, for example 300
      * Loop until we're at revision 1
        * Get the delta operations for revision 300
        * Merge them with revision 1
        * Get the previous revision, 299
        * .. etc, until we're at the revision we want (revision 1)
    * Return the contents of the result of all the merges

The Merge operation is very expensive, it takes multiple minutes. And because 
of the large amount of revisions and the way this loop is set up, it is 
executed many times.

I don't understand enough of the structure of VSS databases and of vss2git to 
be able to see whether all of these steps are essential and how they could be 
optimized. I do know that converting the database I'm working with is not 
practical because it takes far too long.

Original issue reported on code.google.com by [email protected] on 12 Feb 2015 at 12:45

@GoogleCodeExporter
Copy link
Contributor Author

Note that I never completed a full run as described; I stopped the process 
after several days and checked the intermediate results. The result of the 
complete process taking "weeks" is an estimate based on this.

Original comment by [email protected] on 12 Feb 2015 at 12:49

@GoogleCodeExporter
Copy link
Contributor Author

We have about 252k revisions in 56k files and after 2 days of running we are at 
20% progress done (about 7Gb). 1 of 4 CPU's is at 100%. So we also think its 
running slow...

Original comment by [email protected] on 16 Apr 2015 at 2:22

@beppler
Copy link

beppler commented Nov 17, 2020

Hi, I have an issue like this one, but is for an small binary file (a COM type library).

Every time an revision it stucks on xxxx: Edit revision yyy for many seconds, some time even minutes.

Like the example bellow:

Replaying changeset 39 from 07/20/2004 19:36:37
D:\Projects\Local\vss-folha-pub\TJRJ\Fontes\Classes\Folha.dpr: Edit revision 7

For other file types it is very fast.

The conversion is running on a SSD drive with an exclusion set on antivirus software.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants