Set Icon's format using Headers if there's no file extension #42

philgyford · 2023-08-10T14:43:26Z

Occasionally I encounter favicons that don't have a file extension. e.g. https://secure.gravatar.com/blavatar/bd4bda4207561b6998f10dec44b570f04ff4072b20f89162d525b186dfca3e49?s=32

Getting this results in a list of Icon objects like this, with an empty format:

Icon(
    url='https://secure.gravatar.com/blavatar/bd4bda4207561b6998f10dec44b570f04ff4072b20f89162d525b186dfca3e49?s=32',
    width=16,
    height=16,
    format=''
)

In a situation like this could/should favicon use the response headers from requests to determine the format instead? For example, doing:

response = requests.get("https://secure.gravatar.com/blavatar/bd4bda4207561b6998f10dec44b570f04ff4072b20f89162d525b186dfca3e49?s=32")

then response.headers includes:

'Content-Type': 'image/jpeg',
'Content-Disposition': 'inline; filename="bd4bda4207561b6998f10dec44b570f04ff4072b20f89162d525b186dfca3e49.jpeg"'

Perhaps fall back to using one of those to determine the likely file extension? At the moment, from outside favicon, it's impossible to get this data without manually using requests again myself.

(Is this project still maintained?)

The text was updated successfully, but these errors were encountered:

philgyford · 2023-08-23T11:37:49Z

For what it's worth, in my own code I now use the Content-Type when fetching my chosen icon, if no file extension was found earlier.

In case it helps anyone else, this is roughly what I have. Call get_favicon() with the URL of a website.

import favicon
import requests


# We'll only use images with these extensions
ACCEPTED_FILE_EXTENSIONS = ["gif", "jpeg", "jpg", "png", "ico", "webp"]

# Or if it has no extension, we'll only use image with these mime types
ACCEPTED_MIME_TYPES = {
    "image/gif": "gif",
    "image/jpeg": "jpg",
    "image/png": "png",
    "image/vnd.microsoft.icon": "ico",
    "image/webp": "webp",
}

logger = logging.getLogger(__name__)


function get_favicon(website_url):

    icons = favicon.get(website_url, timeout=2)

    if len(icons) == 0:
        logger.warning("No favicons found.")
        return False

    favicon_url = icons[0].url
    favicon_format = icons[0].format
 
    if favicon_format != "" and favicon_format not in ACCEPTED_FILE_EXTENSIONS:
        logger.warning(f"'{favicon_format}' is not an accepted file extension. Abandoning.")
        return False

    try:
        response = requests.get(icons[0].url, stream=True, timeout=2)
        response.raise_for_status()
    except requests.exceptions.HTTPError as err:
        logger.error(f"HTTP error fetching favicon: {err}")
        return False
    except RequestException as err:
        logger.error(f"Error fetching favicon: {err}")
        return False

    if favicon_format == "":
        # Need to use Content-Type to determine format.
        if "Content-Type" in response.headers:
            content_type = response.headers["Content-Type"]
            if content_type in ACCEPTED_MIME_TYPES:
                favicon_format = ACCEPTED_MIME_TYPES[content_type]
            else:
                logger.warning("Favicon not an accepted mime type. Abandoning.")
                return False
        else:
            logger.warning("No file extension or Content-Type. Abandoning.")
            return False

    # Now favicon_format is set, and you can do whatever you need
    # with response.content, which contains the icon data.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set Icon's format using Headers if there's no file extension #42

Set Icon's format using Headers if there's no file extension #42

philgyford commented Aug 10, 2023

philgyford commented Aug 23, 2023

Set Icon's format using Headers if there's no file extension #42

Set Icon's format using Headers if there's no file extension #42

Comments

philgyford commented Aug 10, 2023

philgyford commented Aug 23, 2023