Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: better support for working with / repairing DV files that have timecode incoherency / partially missing subcode data #929

Open
JohnstonJ opened this issue Aug 2, 2024 · 1 comment

Comments

@JohnstonJ
Copy link

Summary of the idea

Repair DV files that have missing subcode DIF data, so that they don't cause problems later on when merging, or when using with other tools. For example, merging these files results in both dropped frames and newly-duplicated frames.

Problematic input data: partial subcode DIFs

I'm capturing Digital8 tapes using a Sony DCR-TRV460 camcorder. The tapes were originally recorded on an older consumer Sony camcorder.

I'm noticing that the DV data has incomplete / erroneous subcode DIF data. For example:

image

Note that there are a fair number of what appear to be dropouts: it looks like identical subcode data is supposed to regularly repeated throughout the frame, but is often replaced with 0xFF bytes. Usually it's at the end of the DIF block, but could happen in the middle or beginning as well.

Curiously, there were no substantial/systematic errors in the video or audio data for the most part. The subcode data, by far, seems to be the most disproportionately affected. I don't know why this is, but the DV file is nevertheless damaged in this way.

How the incomplete subcode data will impact analysis

This issue with the subcodes becomes noticeable in the following ways:

  • It creates discontinuities in the DV timestamps when analyzing the file in DVRescue. Some (but not all) frames seem to get the previous frame's timecode in DVRescue's analysis tool. Note that this does not actually indicate that video data has been repeated: using the previous/next frame buttons in the playback tool to examine the frame + its neighbors shows that every frame is unique.
    image

    Output of DV Analyzer tool:

Percent of frames with Error: 18.55%
Percent of frames with Error (including Arbitrary bit inconsistency): 18.56%
Percent of frames with Video Error Concealment: 0.01%
Percent of frames with Audio Errors: 0.01%
Percent of frames with Timecode Incoherency: 18.54%
Percent of frames with Arbitrary bit inconsistency: 0.00%
  • DV Analyzer reports large numbers of "DV timecode incoherencies". The number of impacted frames is orders of magnitude greater than what DVRescue reports.
    image
  • DV Analyzer and DVRescue report very different results from analyzing the same file. I will use frame 37227 as an example:
    • The frame has several subcode DIF dropouts, similar to the first screenshot of the hexadecimal, above.
    • DVRescue reports that it is a repeated frame / duplicated timecode, with a value of 00:20:42;04
    • DV Analyzer does not report any problem at all with this frame. It does not flag any DV timecode incoherency. It does not flag any repeated timecode. It reports that the timecode for that frame is 00:20:42;05.
    • Therefore, it would appear that the tools have very different ways of reading the timecode from a frame when the subcodes DIFs are incomplete. (The fact that two different QA tools are reporting substantially different results is somewhat concerning to me!)

How this breaks the Merge feature of DVRescue

Most concerningly, this timecode issue cascades into real problems when using the Merge feature in DVRescue. Remember, the vast majority of video data in my sample is intact. Unfortunately, DVRescue makes the problem worse in my situation. I suspect the merge feature is putting a lot of trust in the accuracy of the timecodes, and then mismatches the frames during the merge when the timecodes are incorrectly interpreted.

I have observed these problems when using Merge with these sorts of files:

  • Entire frames are being dropped:
    • I captured one tape 5 times. All captures have the exact same file size / frame count. The frame numbers for every frame are identical across all files - i.e. frame 37227 will be the identical video data in each capture file.
    • Yet, when merging these 5 DV files in DVRescue, the output file is short by many frames!
  • DVRescue is creating duplicate frames / video data where none previously existed! I have seen it take 100% perfectly good video data frame a frame, and replace it with frame data from a neighboring frame - all because the timecode data was slightly off.

The number of problems it incorrectly "fixed" due to these timecode issues greatly exceeded the actual number of video errors I had in one of my better capture passes. Therefore, using the merge feature simply didn't make sense.

Suggested fix: add a feature for totally rewriting subcodes

An idea/suggestion for dealing with this problem: what if DVRescue offered a way to systematically rewrite the timecodes in a single file? It doesn't take a genius to look at the DV Analyzer output and see what the correct timecodes are supposed to be, even if a frame or two gets interpreted with the wrong timecode. DVRescue could add a new feature / function that takes a single DV file as input, and writes a repaired DV file as output. Here's a rough idea of how I imagine it might be able to work:

  1. Each frame in the input file is carefully analyzed to obtain the correct timecode from the subcode DIFs. Since timecodes appear to be duplicated across many subcode DIFs within a frame, it should be possible to (a) merge all the 20 DIFs into a single pair of subcode DIFs, using a "most common byte" strategy or similar, (b) read that timecode from this merged result.
  2. It's possible that a particularly damaged frame might still not give us a reasonable timecode. In that case, we can examine the neighboring frames and extrapolate what the timecode was supposed to be. For example, if the timecodes say we have frames 13, 14, 15, 3, 17, 18, 19..... we can assume that frame 3 was supposed to be frame 16. (Some tuning parameters might be needed to avoid impacting real/actual discontinuities in the timecode, which should appear over larger numbers of frames.)
  3. The output DV code can then be written with subcode DIF structures that have been completely regenerated from scratch, with no dropouts whatsoever. This will ensure consistent behavior in all software that reads DV data and utilizes its timecodes, and eliminate all incoherencies seen by both DV Analyzer and DV Rescue. What you see in one tool will be what you get in any other tool!

This strategy could then be used with DVRescue to merge multiple files as follows:

  1. Each input file will first have its timecodes repaired using the above feature.
  2. The files can then be merged using the normal DVRescue merge feature.

Conceivably, this could simply be (optionally) done as a preprocessing step of the Merge feature itself. But keep in mind that it might still be useful as a standalone tool.

Example test case

I've attached the first couple hundred frames as a test case. Due to GitHub file attachment limits, it is a multi-part ZIP file:

  1. IncoherentTimecodes.zip.001.zip
  2. IncoherentTimecodes.zip.002.zip
  3. IncoherentTimecodes.zip.003.zip
  4. IncoherentTimecodes.zip.004.zip
  5. IncoherentTimecodes.zip.005.zip

To extract this:

  1. Download all the files.
  2. Remove the final .zip extension, i.e. IncoherentTimecodes.zip.003.zip --> IncoherentTimecodes.zip.003
  3. Use a tool like 7-Zip to extract the ZIP archive as normal.

The same 200 frames were captured 5 different times. All of them show several analysis errors in DVRescue, except for the fourth pass, which inexplicably does not, apparently due to a lucky capture attempt. All of them will show analysis errors in DV Analyzer.

Next, try merging them. Use Johnston3-pass1.dv as the initial file to merge.

Here is a screenshot of the merge, showing several "fixed" frames:
image

Unfortunately, it created a new duplicate frame, where none existed before:
image

Note that the sequence number is now showing some errors, where it did not previously do so. Most unfortunately, frame 64 is now a total duplicate of frame 65, whereas frame 64 contained unique video data in all the input files. The true video data for frame 64 has now been lost.

@dericed
Copy link
Contributor

dericed commented Aug 20, 2024

Hi @JohnstonJ, there's a lot to comment on here and I really appreciate your report. A dvfixer would be super helpful, but we didn't have time in our project to work on something like that. Still we're hoping to find a way to continue the project.

As far as dvrescue vs dvanalyzer, the development teams were the same, but the tools were independently developed. The dvrescue approach to analysis is a completely new start rather than a build upon dvanalyzer. There's a number of heuristics that would lead to the tools acting differently. For instance, I think dvanalyzer gives each frame a timecode based on the first found timecode, whereas dvrescue reads the timecodes from all dv dif sequences and goes with the most commonly occurring timecode within the frame.

Pinging @JeromeMartinez to take a look at the samples attached.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants