Skip to content

2. Usage 🎮

John Yang edited this page Jun 27, 2023 · 1 revision

Upon installing InterCode locally, there are several ways you can interact with and learn more about the InterCode environment.

Interactive Shell

The root directory of the repository contains several scripts for interacting with InterCode's current suite of environments. Running python run_<env>.py will initialize an interpreter allowing you to interact with the corresponding environment.

For instance, upon running python run_bash.py, you should see the following output:

INFO    Loaded dataset from ./data/test/bash_queries.json
INFO    Environment Initialized
INFO    * Note *: `reset` should be explicitly called to load new task episode
INFO    -------------
        New task episode initialized
INFO    Query: Search for all files that contain the string 'text file' in
        their name or content under the directory /testbed
INFO    Gold Command: grep -r 'text file' /testbed
> pwd
INFO    Action: pwd
INFO    Observation: /
>

Under the hood, an instance of the BashEnv environment has been initialized, and a new task episode has been loaded, as indicated by the Query and Gold fields.

The > denotes standard input, where you may enter a bash command (in general, an action) as you might in a real terminal to interact with the environment. Upon entering a command, the result of executing the command in the given environment is set as the observation.

The goal of this task is to modify the environment and produce standard output such that the specifications of the natural language query are met. Passing in the submit keyword terminates the current task episode and produces a reward value and info dictionary that describe the correctness of the given actions to answering the original query (calculated with respect to the effects of the Gold command).

> grep -r 'text file' /testbed
INFO     Action: grep -r 'text file' /testbed
INFO     Observation: /testbed/dir3/subdir1/subsubdir1/textfile3.txt
         :Yet another text file /testbed/dir2/subdir1/textfile2.txt...
> submit
INFO     Action: submit
INFO     Info: { 'environment': 'ic-bash', 'reward': 1, 'info': {...
INFO     Reward: 1.0

Experiments

The directory structure of this repository makes it very easy to write and run your own experiments on any Intercode environment.

  • The models/ folder serves as the main store of logic for training or running inference on local models or API endpoints.
  • The experiments/ folder contains a number of examples of how agents and models defiend in the models/ folder can then be deployed on to an Intercode environment

The current experiments/ folder contains the code for experiments discussed in the Intercode paper, and can be invoked via the following call pattern from the root directory of this repository.

python -m experiments.<module name> <flags>

Configuration

Our experiments utilize:

Depending on the models you wish to run, you need to include the respective key. To do so,create a keys.cfg file in the root directory of the repository. Then, copy+paste and fill in the following template for the contents of keys.cfg with your desired keys

# OPENAI_API_KEY: "" ## <Your OpenAI API Key here>
# PALM_API_KEY: "" ## <Your PaLM-2 API key here>
# HF_TOKEN: "" ## <Your Hugging Face access token here>
# HF_API_URL: "" ## <Your Hugging Face Endpoint URL here>

You can also export them as environment variables.

For example, to set OpenAI key use the following in Windows:

setx OPENAI_API_KEY “<yourkey>”
echo <%OPENAI_API_KEY%>

and in Linux:

echo "export OPENAI_API_KEY='yourkey'" >> ~/.zshrc
source ~/.zshrc
echo $OPENAI_API_KEY

Available Environments

Bash

The bash environment can be accessed via the run_bash.py script. The provided Dockerfile is written with ubuntu as the base image and sets up a file system compatible for testing commands from the NL2Bash dataset.

SQL

The SQL environment can be accessed via the run_sql.py script. The provided Dockerfile is written with mysql as the base image and sets up a set of MySQL databases that are compatible for testing commands from the Spider dataset.

CTF

Each CTF task has its own self-contained execution environment derived from IntercodeEnv. The task sets up this environment by loading a specific Docker image having a Bash shell. Following the task query, the agent begins at the ctf directory and tries to solve the challenge of finding the hidden flag. Once the agent is confident of a flag, it submits it to get a reward.

  • Action space: any command that can be run on a Bash shell + submit
  • Rewards:
    • +1 for submitting the correct flag
    • 0 for submitting an incorrect flag
  • Episode end:
    • Termination: Happens when the agent finds and submits the correct flag
    • Truncation: When the number of episodes exceeds 15 (can be configured)
  • Tasks reference: PicoCTF
Clone this wiki locally