Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tz_world.update taking over double the memory on OTP 27.0.1 #38

Open
Jdyn opened this issue Jul 22, 2024 · 14 comments
Open

tz_world.update taking over double the memory on OTP 27.0.1 #38

Jdyn opened this issue Jul 22, 2024 · 14 comments

Comments

@Jdyn
Copy link

Jdyn commented Jul 22, 2024

EDIT: I narrowed it down further and was able to build on 1.17.2 and OTP 26.2.5.2. So it looks like the problem is OTP 27.0.1

I am on tz_world 1.3.3

Hey, i've updated to 1.17 but am unable to deploy due to a significant increase in memory usage when running tz_world.update compared to 1.15.

Here are the two docker images. 1.15.7 builds perfectly and the image with 1.17 OOMs after 4gb of usage.

Elixir 1.15.7 OTP 25.3.2.7 Working image
  # Find eligible builder and runner images on Docker Hub. We use Ubuntu/Debian
  # instead of Alpine to avoid DNS resolution issues in production.
  #
  # https://hub.docker.com/r/hexpm/elixir/tags?page=1&name=ubuntu
  # https://hub.docker.com/_/ubuntu?tab=tags
  #
  # This file is based on these images:
  #
  #   - https://hub.docker.com/r/hexpm/elixir/tags - for the build image
  #   - https://hub.docker.com/_/debian?tab=tags&page=1&name=bullseye-20231009-slim - for the release image
  #   - https://pkgs.org/ - resource for finding needed packages
  #   - Ex: hexpm/elixir:1.15.7-erlang-25.3.2.7-debian-bullseye-20231009-slim
  #
  ARG ELIXIR_VERSION=1.15.7
  ARG OTP_VERSION=25.3.2.7
  ARG DEBIAN_VERSION=bullseye-20231009-slim

  ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}"
  ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}"

  FROM ${BUILDER_IMAGE} as builder

  # install build dependencies
  RUN apt-get update -y && apt-get install -y build-essential git \
      && apt-get clean && rm -f /var/lib/apt/lists/*_*

  # prepare build dir
  WORKDIR /app

  # install hex + rebar
  RUN mix local.hex --force && \
      mix local.rebar --force

  # set build ENV
  ENV MIX_ENV="prod"

  # install mix dependencies
  COPY mix.exs mix.lock ./
  RUN mix deps.get --only $MIX_ENV
  RUN mkdir config

  # copy compile-time config files before we compile dependencies
  # to ensure any relevant config change will trigger the dependencies
  # to be re-compiled.
  COPY config/config.exs config/${MIX_ENV}.exs config/
  RUN mix deps.compile

  COPY priv priv

  COPY lib lib

  # Compile the release
  RUN mix compile

  # Changes to config/runtime.exs don't require recompiling the code
  COPY config/runtime.exs config/

  COPY rel rel
  RUN mix tz_world.update
  RUN mix release

  # start a new build stage so that the final image will only contain
  # the compiled release and other runtime necessities
  FROM ${RUNNER_IMAGE}

  RUN apt-get update -y && \
    apt-get install -y libstdc++6 openssl libncurses5 locales ca-certificates \
    && apt-get clean && rm -f /var/lib/apt/lists/*_*

  # Set the locale
  RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen

  ENV LANG en_US.UTF-8
  ENV LANGUAGE en_US:en
  ENV LC_ALL en_US.UTF-8

  WORKDIR "/app"
  RUN chown nobody /app

  # set runner ENV
  ENV MIX_ENV="prod"

  # Only copy the final release from the build stage
  COPY --from=builder --chown=nobody:root /app/_build/${MIX_ENV}/rel/nimble ./

  USER nobody

  # If using an environment that doesn't automatically reap zombie processes, it is
  # advised to add an init process such as tini via `apt-get install`
  # above and adding an entrypoint. See https://github.com/krallin/tini for details
  # ENTRYPOINT ["/tini", "--"]

  CMD ["/app/bin/server"]

Elixir 1.17.2 OTP 27.0.1 OOM

  # Find eligible builder and runner images on Docker Hub. We use Ubuntu/Debian
  # instead of Alpine to avoid DNS resolution issues in production.
  #
  # https://hub.docker.com/r/hexpm/elixir/tags?page=1&name=ubuntu
  # https://hub.docker.com/_/ubuntu?tab=tags
  #
  # This file is based on these images:
  #
  #   - https://hub.docker.com/r/hexpm/elixir/tags - for the build image
  #   - https://hub.docker.com/_/debian?tab=tags&page=1&name=bullseye-20231009-slim - for the release image
  #   - https://pkgs.org/ - resource for finding needed packages
  #   - Ex: hexpm/elixir:1.15.7-erlang-25.3.2.7-debian-bullseye-20231009-slim
  #
  ARG ELIXIR_VERSION=1.17.2
  ARG OTP_VERSION=27.0.1
  ARG DEBIAN_VERSION=buster-20240612-slim

  ARG BUILDER_IMAGE="hexpm/elixir:${ELIXIR_VERSION}-erlang-${OTP_VERSION}-debian-${DEBIAN_VERSION}"
  ARG RUNNER_IMAGE="debian:${DEBIAN_VERSION}"

  FROM ${BUILDER_IMAGE} as builder

  # install build dependencies
  RUN apt-get update -y && apt-get install -y build-essential git \
      && apt-get clean && rm -f /var/lib/apt/lists/*_*

  # prepare build dir
  WORKDIR /app

  # install hex + rebar
  RUN mix local.hex --force && \
      mix local.rebar --force

  # set build ENV
  ENV MIX_ENV="prod"

  # install mix dependencies
  COPY mix.exs mix.lock ./
  RUN mix deps.get --only $MIX_ENV
  RUN mkdir config

  # copy compile-time config files before we compile dependencies
  # to ensure any relevant config change will trigger the dependencies
  # to be re-compiled.
  COPY config/config.exs config/${MIX_ENV}.exs config/
  RUN mix deps.compile

  COPY priv priv

  COPY lib lib

  # Compile the release
  RUN mix compile

  # Changes to config/runtime.exs don't require recompiling the code
  COPY config/runtime.exs config/

  COPY rel rel
  RUN mix tz_world.update
  RUN mix release

  # start a new build stage so that the final image will only contain
  # the compiled release and other runtime necessities
  FROM ${RUNNER_IMAGE}

  RUN apt-get update -y && \
    apt-get install -y libstdc++6 openssl libncurses5 locales ca-certificates \
    && apt-get clean && rm -f /var/lib/apt/lists/*_*

  # Set the locale
  RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && locale-gen

  ENV LANG en_US.UTF-8
  ENV LANGUAGE en_US:en
  ENV LC_ALL en_US.UTF-8

  WORKDIR "/app"
  RUN chown nobody /app

  # set runner ENV
  ENV MIX_ENV="prod"

  # Only copy the final release from the build stage
  COPY --from=builder --chown=nobody:root /app/_build/${MIX_ENV}/rel/nimble ./

  USER nobody

  # If using an environment that doesn't automatically reap zombie processes, it is
  # advised to add an init process such as tini via `apt-get install`
  # above and adding an entrypoint. See https://github.com/krallin/tini for details
  # ENTRYPOINT ["/tini", "--"]

  CMD ["/app/bin/server"]

Screenshot 2024-07-22 091754

  • The first spike is the attempt at building 1.17.2. Note that the 1.17 build crashes and doesn't complete because I have a 4gb build limit, so it is likely using even more memory.
  • The second spike later on is the build on 15.7 which completes entirely.

Any ideas?

@Jdyn Jdyn changed the title tz_world.update taking over double the memory on Elixir 1.17 tz_world.update taking over double the memory on OTP 27.0.1 Jul 22, 2024
@kipcole9
Copy link
Collaborator

That's definitely unexpected. And there haven't been any commits that should affect the mix task. I see you changed the issue description to reflect an OTP 27 difference, rather than an Elixir 1.17 difference. Does that mean you see the memory difference with the same Elixir version but different OTP version?

I will certainly take a look at this, but if might take a couple of days to try and diagnose.

@kipcole9
Copy link
Collaborator

And I'll experiment with using the :json module in OTP27 and see if that makes an immediate difference.

@Jdyn
Copy link
Author

Jdyn commented Jul 24, 2024

That's definitely unexpected. And there haven't been any commits that should affect the mix task. I see you changed the issue description to reflect an OTP 27 difference, rather than an Elixir 1.17 difference. Does that mean you see the memory difference with the same Elixir version but different OTP version?

I will certainly take a look at this, but if might take a couple of days to try and diagnose.

I can confirm that the memory difference is caused by the difference in OTP versisons for me. Building with OTP 25-26, and elixir 15-17 sees the same memory usage during tz_world.update. But when I introduce OTP 27, the memory usage spikes by double or more, though I cannot see the peak spike because my machine OOMs before it can complete.

@kipcole9
Copy link
Collaborator

Thanks much for the diagnostic. There are some things I can try to do to reduce memory usage and I'll experiment on the weekend. The key function that is most likely the memory consumer is:

  def transform_source_data(source_data, version) when is_binary(source_data) do
    case :zip.unzip(source_data, [:memory]) do
      {:ok, [{_, json} | _rest]} ->
        json
        |> Jason.decode!()
        |> Geo.JSON.decode!()
        |> Map.get(:geometries)
        |> Enum.map(&update_map_keys/1)
        |> Enum.map(&calculate_bounding_box/1)
        |> List.insert_at(0, version)

      error ->
        raise RuntimeError, "Unable to unzip downloaded data. Error: #{inspect error}"
    end
  end

With that in mind I can try:

  1. Jason.decode!(strings: :copy) (the default is :reference) since maybe the issue is that binaries are not being garbage collected
  2. Switch to the new :json module in OTP 27 and see if that makes a difference.
  3. See if I can use Jaxon's streaming json decoder

I will put two development branches together now so you can test (1) and (2). I'll look at (3) over the weekend.

And somehow I have to find a reproducible case I can submit to the OTP team.

@kipcole9
Copy link
Collaborator

I've done some basic experiments and I see no material difference between using :json versus Jason. And curiously I see no material difference in memory usage on OTP26 versus OTP27 on my iMac Pro.

That means that (1) and (2) don't appear to make any material difference in memory consumption. I also added a call to :erlang.garbage_collect after decoding the JSON and that also made no material difference.

@Jdyn
Copy link
Author

Jdyn commented Jul 24, 2024

That's interesting, this isn't my strongest area but perhaps it could be a memory leak involving linux and OTP 27 since it is seemingly only happening in this linux docker image. I did provide the debian image I am building with. Feel free to take your time as OTP 26 is sufficing quite well.

@Jdyn
Copy link
Author

Jdyn commented Aug 7, 2024

Could be related?
erlang/otp#8682

@peaceful-james
Copy link

peaceful-james commented Sep 6, 2024

Could be related? erlang/otp#8682

This seems like the problem in my case. I have plenty of memory but am getting runtime crashes on boot that look like other segfault bugs I have seen in the past.

Update: actually, I am seeing my application crash even with TzWorld.Backend.Memory. It is not an OOM problem.

The only error I see is this:

{exit,terminating,[{application_controller,call,2,[{file,"application_controller.erl"},{line,511}]},{application,enqueue_or_start,6,[{file,"application.erl"},{line,380}]},{application,ensure_all_started,3,[{file,"application.erl"},{line,359}]},{elixir,start_cli,0,[{file,"src/elixir.erl"},{line,195}]},{init,start_it,1,[]},{init,start_em,1,[]},{init,do_boot,3,[]}]}

lang versions:
erlang 27.0.1
elixir 1.17.2-otp-27

Another update:
The problem went away when I upgraded :tz from 0.27.1 to 0.27.2
@Jdyn can you try this?

@kipcole9
Copy link
Collaborator

kipcole9 commented Sep 6, 2024

I've pushed [a commit[(https://github.com/kipcole9/tz_world/commit/aad71d3815bf4b16a438ac8d4a07b5f7e125a5d4) that makes some attempt to more aggressively garbage collect the large binaries that get generated during the update process.

I don't think this is a comprehensive solution, but I'd be interested if it makes a difference in your situations?

Mix.exs

def deps() do
  [
    {:tz_world, GitHub: "kipcole9/tz_world"}
  ]
end

Mix task

I've added a new --trace argument that does limited tracing and memory profiling.

% mix tz_world.update --trace

Feedback most definitely welcome.

@kipcole9
Copy link
Collaborator

kipcole9 commented Sep 6, 2024

The problem went away when I upgraded :tz from 0.27.1 to 0.27.2

Very interesting - but I don't think it relates to this particular issue?

@peaceful-james
Copy link

Very interesting - but I don't think it relates to this particular issue?

You are right.

@keyhanjk
Copy link

Could be related?
60-bit process/port identifiers on 64-bit runtime

@kipcole9
Copy link
Collaborator

kipcole9 commented Sep 28, 2024

I'm thinking about an overhaul of the whole updating process. Comments would be most welcome.

  1. Instead of downloading and locally processing the updated tz data, I'll process it once and store the timezones_geodata.etf file in git-lfs in the repo. Then the update process would download the relevant .etf file, and generate the .dets file from it. The .dets file is nearly 1Gb so downloading it directly wouldn't seem appropriate. The downloaded files would be stored as something like:
    a. timezones_without_oceans_2024b.etf.zip and
    b. timezones_with_oceans_2024b.etf.zip

  2. There are a number of backends implemented. But as best as I can tell, TzWorld.Backend.DetsWithIndexCache is really the most useful. I therefore propose deprecating and undocumenting then removing the other backends. I'll still leave the option to define other backends in case there is some particular use case or technology.

  3. I plan to move the repo of record from kimlai/tz_world to kipcole9/tz_world. kimlai is the original author and there won't be any change to the record of his authorship. This step is just to simplify my maintenance of the library (kimlai is no longer an active maintainer and hasn't been for a couple of years).

I will ship one more maintenance release on the current structure that supports geo version 4.0 and then I'll work on the above unless there is feedback otherwise.

@kipcole9
Copy link
Collaborator

A further update here as I am about to publish a new release. This release has the mix tz_world.update --trace flag, and some attempts at memory use improvements. I now see a peak usage of 1.7Gb (BEAM reported memory usage) on MacOS on both OTP-26 and OTP-27.

% mix tz_world.update --trace
09:32:41.549 [debug] [52 MiB] Retrieved list of 21 available timezone data releases.
09:32:41.549 [info] [TzWorld] No timezone geo data installed. Installing the latest release 2024b.
09:33:04.268 [debug] [1712 MiB] Transforming source data
09:33:12.340 [debug] [1226 MiB] Transformed source data
09:33:18.489 [debug] [235 MiB] Compressed data into a zip file
09:33:18.489 [debug] [97 MiB] Reloading timezone data
09:33:23.355 [debug] [603 MiB] Reloaded timezone data

Any feedback about what you observe will be most welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants