Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: list index out of range (New) #279

Open
wooemans opened this issue Sep 7, 2024 · 0 comments
Open

IndexError: list index out of range (New) #279

wooemans opened this issue Sep 7, 2024 · 0 comments

Comments

@wooemans
Copy link

wooemans commented Sep 7, 2024

xiaoran@xiaorandeMacBook-Pro ~ % marker_single "/Users/xiaoran/Downloads/Decree books/_I AM_ DECREE BOOKLET - BOOK 4.pdf" "/Users/xiaoran/Downloads" --batch_multiplier 2 --langs English
Loading detection model vikp/surya_det2 on device cpu with dtype torch.float32
Loading detection model vikp/surya_layout2 on device cpu with dtype torch.float32
Loading reading order model vikp/surya_order on device cpu with dtype torch.float32
Loaded texify model to cpu with torch.float32 dtype
Detecting bboxes: 100%|███████████████████████| 52/52 [1:20:39<00:00, 93.07s/it]
Loading recognition model vikp/surya_rec on device cpu with dtype torch.float32
Recognizing Text: 100%|███████████████████████████| 4/4 [04:48<00:00, 72.10s/it]
Detecting bboxes: 100%|██████████████████████| 35/35 [1:07:42<00:00, 116.07s/it]
Finding reading order: 100%|████████████████████| 35/35 [47:59<00:00, 82.28s/it]
Traceback (most recent call last):
  File "/usr/local/bin/marker_single", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/convert_single.py", line 26, in main
    full_text, images, out_meta = convert_single_pdf(fname, model_lst, max_pages=args.max_pages, langs=langs, batch_multiplier=args.batch_multiplier)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/marker/convert.py", line 134, in convert_single_pdf
    extract_images(doc, pages)
  File "/usr/local/lib/python3.12/site-packages/marker/images/extract.py", line 72, in extract_images
    extract_page_images(page_obj, page)
  File "/usr/local/lib/python3.12/site-packages/marker/images/extract.py", line 42, in extract_page_images
    block = page.blocks[block_idx]
            ~~~~~~~~~~~^^^^^^^^^^^
IndexError: list index out of range

Attachment 1 is the PDF file. I have tried removing the blank pages from this file, but the issue persists. Some files with more pages than this one can be successfully converted. However, some files with fewer pages also encounter this issue during conversion, such as the file in Attachment 2.

Could you please advise on how to resolve this issue? I have already installed the latest version of Marker, but the problem still remains.
I AM DECREE BOOKLET - BOOK 4.pdf
I AM DECREES SERIES 2 (1).pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant