Skip to content

Commit

Permalink
Fix doc and remove some unused imports.
Browse files Browse the repository at this point in the history
  • Loading branch information
radumarias committed Aug 2, 2024
1 parent 92ac534 commit ade28ed
Show file tree
Hide file tree
Showing 4 changed files with 59 additions and 30 deletions.
1 change: 1 addition & 0 deletions .idea/syncoxiders.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions file-tree-merge/src/apply_change.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ use rayon::prelude::IntoParallelRefIterator;
use crate::change_tree::Change;
use crate::change_tree_merge::{DstItems, HashKind, MergedChanges};
use crate::tree_creator::Item;
use crate::{crc_eq, file_hash, git_add, git_commit, retry, TREE_DIR};
use crate::{crc_eq, file_hash, git_commit, retry};

pub fn apply(
changes: MergedChanges,
Expand Down Expand Up @@ -191,17 +191,17 @@ fn process(
path1_item2: Arc<Mutex<DstItems>>,
path1: &Path,
path2: &Path,
repo1: &Path,
_repo1: &Path,
_repo2: &Path,
dry_run: bool,
checksum: bool,
crc: bool,
ctr: &AtomicU64,
git_lock: Arc<Mutex<()>>,
_git_lock: Arc<Mutex<()>>,
batch_size: usize,
synced: &AtomicU64,
applied_size_since_commit: &AtomicU64,
commit_after_size_bytes: i32,
_commit_after_size_bytes: i32,
print_all_changes: bool,
) -> Result<()> {
let dst = path2.join(path);
Expand Down
2 changes: 1 addition & 1 deletion file-tree-merge/src/tree_creator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ use colored::Colorize;
use rayon::iter::ParallelIterator;
use rayon::prelude::IntoParallelRefIterator;

use crate::{git_commit, retry, IterRef};
use crate::{retry, IterRef};

// pub(crate) const PATH_SEPARATOR: &str = "/";
pub(crate) const PATH_SEPARATOR: &str = "|";
Expand Down
78 changes: 53 additions & 25 deletions website/pages/poc-demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,35 +5,42 @@
`One-way` `One-to-Many` sync for `Add`, `Modify`, `Delete`, `Rename` operations. You can see here a short demo:
[![Watch the video](https://img.youtube.com/vi/JHQC1XpCzQw/0.jpg)](https://www.youtube.com/watch?v=JHQC1XpCzQw)

We'll exemplify for 2 paths, from `path1` to `path2` but it works with multiple paths, it will Sync from `path1` to others.
We'll exemplify for 2 paths, from `path1` to `path2` but it works with multiple paths, it will Sync from `path1` to
others.

**`One-way` sync:**

- have 2 mounted folders with rclone (`path1`, `path2`)
- build changes tree for `path1`
- apply changes from `path1` to `path2` for these operations:
- `Add`, `Modify`, `Delete`, `Rename`

We use `git` to catch the changes, how it works:
- we have a special directory `repo` shared for both endpoints. This will be used to create a git repo for each path and tracks changes. It should persist between runs

- we have a special directory `repo` shared for both endpoints. This will be used to create a git repo for each path and
tracks changes. It should persist between runs
- inside the `repo` for each path we create a `tree` directory and create the tree structure from `path` in there
- in the files content we keep `size` and `mtime` or `MD5 hash` if enabled
- we do `git add .`, then `git status -s` shows what's changed, we use `git2` crate to interact with git
- after we have the changes tree we apply them to `path2`
- on `Add` and `Modify` we check if the file is already present in `path2` and if it's the same as in `path1` we skip it
- on `Add` and `Modify` we check if the file is already present in `path2` and if it's the same as in `path1` we
skip it
- comparison between the files is made using `size`, `mtime` or `MD5 hash`, if enabled
- on `Rename` if the `old` file is not present in the `path2` to move it, we copy it from `path1`
- changes are applied wth WAL logic, we use git changes as WAL
- after we build the changes tree we unstage all changes
- after we applied a change for a file we stage that file in git
- after each `64MB` we write we are checkpointing (we do `git commit`)
- after applying all changes we commit remainig staged ones
- then we delete the history and keep just an index of all files so e can catch new changes
- like this if the process is suddenly interrupted the next time it runs it will see there are changes and will apply them
- this hapens until all pending changes are applied
- after we build the changes tree we unstage all changes
- after we applied a change for a file we stage that file in git
- after each `64MB` we write we are checkpointing (we do `git commit`)
- after applying all changes we commit remainig staged ones
- then we delete the history and keep just an index of all files so e can catch new changes
- like this if the process is suddenly interrupted the next time it runs it will see there are changes and will
apply them
- this hapens until all pending changes are applied

# Using CLI

You can find the binaries [here](https://github.com/radumarias/syncoxiders/actions/workflows/ci.yml). Select the last run and at the bottom you should have binaries for multiple OSs.
You can find the binaries [here](https://github.com/radumarias/syncoxiders/actions/workflows/ci.yml).
Select the last successful run and at the bottom you should have binaries for multiple OSs.

For other targets you could clone the repo and build it, see below how.

Expand All @@ -44,33 +51,49 @@ syncoxiders --repo <REPO> <PATH1> <PATH2>
```

- `inputs` (`<PATH1> <PATH2>`): a lists of paths that will be synced
- `<REPO>`: a directory that should persist between runs, we create a `git` repo with metadata of files from all paths. **MUST NOT BE IN ANY OF THE PATHS**. If it doesn't persist, next time it runs it will see all files as `Add`ed, but will skip copying them if already the same as on the other side
- `<REPO>`: a directory that should persist between runs, we create a `git` repo with metadata of files from all paths.
**MUST NOT BE IN ANY OF THE PATHS**. If it doesn't persist, next time it runs it will see all files as `Add`ed, but
will skip copying them if already the same as on the other side

For now, it does `One-way` sync propagating the changes from `path1` to other paths:

- `Add`, `Modify`, `Delete`, `Rename`
- on `Add` and `Modify` we check if the file is already present in `path2` and if it's the same as in `path1` we skip it
- comparison between the files is made using `size` and `mtime` or `MD5 hash` (if enabled, see `--checksum` param below)
- on `Rename` if the `old` file is not present in the `path2` to move it, we copy it from `path1`

By default it detects changes in files based on `size` and `mtime`. After copying to `path2` it will set `atime` and `mtime` for the files.
By default it detects changes in files based on `size` and `mtime`. After copying to `path2` it will set `atime`
and `mtime` for the files.

Other args:
- `--checksum`: (disabled by default): if specified it will calculate `MD5 hash` for files when detecting changes and when comparing file in `path1` with the file in `path2` when applying `Add` and `Modify` operations.
**This is especially useful if any pf the paths is accessed over the network and doesn't support `mtime` or even `size` or if the clocks are out of sync**
**It will be considerably slower when enabled**
- `--no-crc`: (disabled by default): if specified it will skip `CRC` check after file was transferred. Without this it compares the `CRC` of the file in `path1` with the `CRC` of the file in `path2` after transferred. This ensures the file integrity after transfered.
**Checking `CRC` is highly recommend if any of the paths is accessed over the network.**
- `--dry-run`: this simulates the sync. Will not apply any changes to the paths, will just print the operations that would have normally be applied to paths
- `log-all-changes`: by default it doesn't log each change that is applied, but every 100th change so it won't clutter the logs. Setting this will log all changes

- `--checksum`: (disabled by default): if specified it will calculate `MD5 hash` for files when detecting changes and
when comparing file in `path1` with the file in `path2` when applying `Add` and `Modify` operations.
**This is especially useful if any pf the paths is accessed over the network and doesn't support `mtime` or
even `size` or if the clocks are out of sync**
**It will be considerably slower when enabled**
- `--no-crc`: (disabled by default): if specified it will skip `CRC` check after file was transferred. Without this it
compares the `CRC` of the file in `path1` with the `CRC` of the file in `path2` after transferred. This ensures the
file integrity after transfered.
**Checking `CRC` is highly recommend if any of the paths is accessed over the network.**
- `--dry-run`: this simulates the sync. Will not apply any changes to the paths, will just print the operations that
would have normally be applied to paths
- `log-all-changes`: by default it doesn't log each change that is applied, but every 100th change so it won't clutter
the logs. Setting this will log all changes

## Limitations

- conflicts are not handled yet. If the file is changed in both `path1` and `path2` the winner is the one from `path1`. It's like `master-slave` sync where `path1` is the master
- it doesn't sync any of `Add`, `Delete`, or `Rename` operations on empty folders. This is actually a limitation of `git` as it works only on files. The directory tree will be recreated based on the file parent, but folders with no files in them will not be synced
- conflicts are not handled yet. If the file is changed in both `path1` and `path2` the winner is the one from `path1`.
It's like `master-slave` sync where `path1` is the master
- it doesn't sync any of `Add`, `Delete`, or `Rename` operations on empty folders. This is actually a limitation
of `git` as it works only on files. The directory tree will be recreated based on the file parent, but folders with no
files in them will not be synced

## Troubleshooting

In case you experience any inconsistencies in the way the files are synced, or not synced, you can delete the `repo` directory and run it again. It will see all files as new but will not copy them to the oher sides if already present and the same, it will just copy the new or changed ones.
In case you experience any inconsistencies in the way the files are synced, or not synced, you can delete the `repo`
directory and run it again. It will see all files as new but will not copy them to the oher sides if already present and
the same, it will just copy the new or changed ones.

## Compile it from source code

Expand All @@ -91,9 +114,13 @@ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

#### Configuring the PATH environment variable

In the Rust development environment, all tools are installed to the `~/.cargo/bin` directory, and this is where you will find the Rust toolchain, including rustc, cargo, and rustup.
In the Rust development environment, all tools are installed to the `~/.cargo/bin` directory, and this is where you will
find the Rust toolchain, including rustc, cargo, and rustup.

Accordingly, it is customary for Rust developers to include this directory in their `PATH` environment variable. During installation rustup will attempt to configure the `PATH`. Because of differences between platforms, command shells, and bugs in rustup, the modifications to PATH may not take effect until the console is restarted, or the user is logged out, or it may not succeed at all.
Accordingly, it is customary for Rust developers to include this directory in their `PATH` environment variable. During
installation rustup will attempt to configure the `PATH`. Because of differences between platforms, command shells, and
bugs in rustup, the modifications to PATH may not take effect until the console is restarted, or the user is logged out,
or it may not succeed at all.

If, after installation, running `rustc --version` in the console fails, this is the most likely reason.

Expand All @@ -108,6 +135,7 @@ $HOME/.cargo/env
```bash
cargo build --release
```

### Run it

```bash
Expand Down

0 comments on commit ade28ed

Please sign in to comment.