Skip to content

4b Human perception

Hou Yujun edited this page Jul 25, 2024 · 2 revisions

Getting human perception scores from street-level imagery. The perception categorries are safety, lively, beautiful, wealthy, boring and depressing.

The scores are in scale of 0-10.

Safety, lively, beautiful, wealthy high score indicates strong positive feeling

Boring, depressing high score indicates strong negative feeling

Model

The models are pretrained on the MIT Place Pulse 2.0 dataset. The backbone of the model is vision transformer (ViT) pretrianed on ImageNet (ViT_B_16_Weights.IMAGENET1K_SWAG_E2E_V1). We added 3 Linear layers with ReLU as activation, in ViT heads for classification.

Code snippet:

nn.Linear(num_fc, 512, bias=True),
nn.ReLU(True),
nn.Linear(512, 256, bias=True),
nn.ReLU(True),
nn.Linear(256, num_class, bias=True)

The model structure can be found in code/model_training/perception/Model_01.py. The pretained models will be automatically downloaded when run inference.py (recommended method). You can also manually download the models here.

How to run the model

Set up environment with requirements-cv-linux.txt.

Input

The input CSV should:

  • have each row representing an image to process, and
  • contain minimally two columns, named uuid and path, to specify image UUID and the local image file path, respectively

Output

One CSV for each perception dimension.

Each CSV contains two columns: uuids(image name) and the inferred perception scores.

To reproduce sample_output

Modify out_Path in inference.py to the directory you wish to store the output CSVs, then

python3 inference.py

To run inference for your own image/images

Modify inference.py:

  1. Modify out_Path to the directory you wish to store the output CSVs
  2. Modify in_Path to the path of your input CSV

Run:

python3 inference.py

Acknowledgements

Our work in human perception builds on and uses code from human-perception-place-pulse developed by Ouyang (2023).