Add support for rankllama #294

aniquetahir · 2024-08-06T17:15:16Z

This pull request adds support for RankLlama with LoRA. Detailed instructions are included in README.md.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

pecos/xmr/reranker/trainer.py

OctoberChang · 2024-08-06T21:07:27Z

pecos/xmr/reranker/README.md

@@ -0,0 +1,91 @@
+# PECOS XMR Reranker


Some general comments about the README.md:

We should also introduce the data schema for training/inference?

For the Command Line Usage (CLI), we have pecos.xmr.reranker.train. Should we also have pecos.xmc.reranker.predict?

Do we want to support Python API usage?

Another high level comment. Can we avoid hard-coded the input columns and make them configurable in the config JSON file? Some hard-coded columns, for example:

Line 79 of data_utils.py: keywords

Line 110 of data_utils.py: contents, titles

Line 296-298 of model.py: inp_id, ret_idxs, rel

Another high level comment. Can we avoid hard-coded the input columns and make them configurable in the config JSON file? Some hard-coded columns, for example:

* Line 79 of `data_utils.py`: `keywords` * Line 110 of `data_utils.py`: `contents`, `titles` * Line 296-298 of `model.py`: `inp_id`, `ret_idxs`, `rel`

This can now be specified in the configuration. I added details in the README.md.

Some general comments about the README.md:

* We should also introduce the data schema for training/inference? * For the Command Line Usage (CLI), we have `pecos.xmr.reranker.train`. Should we also have `pecos.xmc.reranker.predict`? * Do we want to support Python API usage?

Added predictions.

pecos/xmr/reranker/model.py

OctoberChang · 2024-08-06T21:19:28Z

pecos/xmr/reranker/model.py

+            params: The model parameters (RankingModelParams)
+        """
+        training_args = train_params.training_args
+        training_args.remove_unused_columns = False


Is this line still necessary?

Since we are adding additional information to the output of the collate function, this is needed to avoid it being removed by the trainer.

OctoberChang · 2024-08-08T00:31:29Z

pecos/xmr/reranker/model.py

+        """
+        Enable gradient checkpointing for the model
+        """
+        self.hf_model.enable_input_require_grads()


Not sure if this is the right place to call hf_model.enable_input_require_grads().
From Tevatron RankLlaMA implementation (https://github.com/texttron/tevatron/blob/main/src/tevatron/reranker/modeling.py#L79), they are calling only when both "LoRA" and "training_args.gradient_checkpointing" is enable.

The following sequence of operation is expected:

First get_modifed_model is called.

the trainer calls gradient_checkpointing_enable, when gradient checkpointing is enabled.

OctoberChang · 2024-08-08T18:34:57Z

pecos/xmr/reranker/model.py

+        lbl_idxs: List[int],
+    ):
+        """
+        Collate function for training. Tokenizes the input and return features and returns the collated batch.


Doc String seems to be out-dated.

OctoberChang · 2024-08-08T18:35:23Z

pecos/xmr/reranker/model.py

+            model_params (RankingModel.ModelParams): the model parameters
+            train_params (RankingModel.TrainParams): the training parameters
+        Returns:
+            An instance of UberGlobalModel


Doc String seems to be out-dated. Remove UberGlobalModel.

Updated docstrings

OctoberChang · 2024-08-12T16:24:04Z

setup.py

-    'transformers>=4.4.2; python_version>="3.9"'
+    'transformers>=4.4.2; python_version>="3.9"',
+    'tqdm>=4.66.4',
+    'peft>=0.11.0',


The peft package requires python_version >= 3.8 that's why the latest unit test failed.

Please also check the minimal support python version for other libraries you introduce.

OctoberChang

LGTM.

aniquetahir marked this pull request as draft August 6, 2024 17:15

aniquetahir marked this pull request as ready for review August 6, 2024 17:16

OctoberChang self-assigned this Aug 6, 2024

aniquetahir force-pushed the xmr_argparse branch 7 times, most recently from 2b78a90 to dd94777 Compare August 6, 2024 18:47

aniquetahir marked this pull request as draft August 6, 2024 18:50

aniquetahir marked this pull request as ready for review August 6, 2024 18:57

OctoberChang assigned OctoberChang and aniquetahir and unassigned OctoberChang Aug 6, 2024

OctoberChang requested review from OctoberChang, hallogameboy, jiong-zhang and rofuyu August 6, 2024 19:52

OctoberChang reviewed Aug 6, 2024

View reviewed changes

OctoberChang reviewed Aug 8, 2024

View reviewed changes

aniquetahir marked this pull request as draft August 8, 2024 16:51

aniquetahir force-pushed the xmr_argparse branch 6 times, most recently from 87cc9f6 to 63a083b Compare August 8, 2024 17:39

aniquetahir marked this pull request as ready for review August 8, 2024 17:40

aniquetahir force-pushed the xmr_argparse branch from 63a083b to a648822 Compare August 8, 2024 18:07

aniquetahir marked this pull request as draft August 8, 2024 18:22

aniquetahir force-pushed the xmr_argparse branch from a648822 to 6bfec94 Compare August 8, 2024 18:23

aniquetahir marked this pull request as ready for review August 8, 2024 18:25

OctoberChang reviewed Aug 8, 2024

View reviewed changes

aniquetahir marked this pull request as draft August 8, 2024 21:59

aniquetahir force-pushed the xmr_argparse branch from 6bfec94 to 725f043 Compare August 8, 2024 22:00

aniquetahir marked this pull request as ready for review August 8, 2024 22:03

OctoberChang reviewed Aug 12, 2024

View reviewed changes

aniquetahir marked this pull request as draft August 12, 2024 17:16

aniquetahir force-pushed the xmr_argparse branch from be0c80c to 6bc4b6a Compare August 12, 2024 17:17

added support for rankllama

7a3cb6e

aniquetahir force-pushed the xmr_argparse branch from 6bc4b6a to 7a3cb6e Compare August 12, 2024 23:38

aniquetahir marked this pull request as ready for review August 12, 2024 23:46

OctoberChang approved these changes Aug 13, 2024

View reviewed changes

OctoberChang merged commit ea254b0 into amzn:mainline Aug 13, 2024
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for rankllama #294

Add support for rankllama #294

aniquetahir commented Aug 6, 2024

OctoberChang Aug 6, 2024

OctoberChang Aug 6, 2024

aniquetahir Aug 8, 2024 •

edited

Loading

OctoberChang Aug 6, 2024

aniquetahir Aug 8, 2024

OctoberChang Aug 8, 2024

aniquetahir Aug 8, 2024

OctoberChang Aug 8, 2024

OctoberChang Aug 8, 2024

aniquetahir Aug 8, 2024

OctoberChang Aug 12, 2024 •

edited

Loading

OctoberChang left a comment

Add support for rankllama #294

Add support for rankllama #294

Conversation

aniquetahir commented Aug 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aniquetahir Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OctoberChang Aug 12, 2024 • edited Loading

Choose a reason for hiding this comment

OctoberChang left a comment

Choose a reason for hiding this comment

aniquetahir Aug 8, 2024 •

edited

Loading

OctoberChang Aug 12, 2024 •

edited

Loading