Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: yet another attempt to add windows builds #231

Draft
wants to merge 37 commits into
base: main
Choose a base branch
from

Conversation

baszalmstra
Copy link
Member

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

Fixes #32

This PR is another attempt to add Windows builds (see #134) .

For now I disabled all other builds to be able to test the windows part first. I made this PR draft so we don't accidentally merge it.

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe:

  • It looks like the 'libtorch' output doesn't have any tests.

@baszalmstra baszalmstra marked this pull request as draft April 5, 2024 13:00
recipe/meta.yaml Outdated Show resolved Hide resolved
@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe) and found some lint.

Here's what I've got...

For recipe:

  • Old-style Python selectors (py27, py35, etc) are only available for Python 2.7, 3.4, 3.5, and 3.6. Please use explicit comparisons with the integer py, e.g. # [py==37] or # [py>=37]. See lines [54]

For recipe:

  • It looks like the 'libtorch' output doesn't have any tests.

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

I do have some suggestions for making it better though...

For recipe:

  • It looks like the 'libtorch' output doesn't have any tests.

@baszalmstra
Copy link
Member Author

Both pipelines failed because they ran out of disk space:

FAILED: caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/runtime/static/te_wrapper.cpp.obj 
C:\PROGRA~1\MICROS~2\2022\ENTERP~1\VC\Tools\MSVC\1429~1.301\bin\HostX64\x64\cl.exe  /nologo /TP -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNOMINMAX -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_MIMALLOC -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_DEPRECATE=1 -D_UCRT_LEGACY_INFINITY -Dtorch_cpu_EXPORTS -I%SRC_DIR%\build\aten\src -I%SRC_DIR%\aten\src -I%SRC_DIR%\build -I%SRC_DIR% -I%SRC_DIR%\third_party\onnx -I%SRC_DIR%\build\third_party\onnx -I%SRC_DIR%\third_party\foxi -I%SRC_DIR%\build\third_party\foxi -I%SRC_DIR%\third_party\mimalloc\include -I%SRC_DIR%\torch\csrc\api -I%SRC_DIR%\torch\csrc\api\include -I%SRC_DIR%\caffe2\aten\src\TH -I%SRC_DIR%\build\caffe2\aten\src\TH -I%SRC_DIR%\build\caffe2\aten\src -I%SRC_DIR%\build\caffe2\..\aten\src -I%SRC_DIR%\torch\csrc -I%SRC_DIR%\third_party\miniz-2.1.0 -I%SRC_DIR%\third_party\kineto\libkineto\include -I%SRC_DIR%\third_party\kineto\libkineto\src -I%SRC_DIR%\aten\src\ATen\.. -I%SRC_DIR%\c10\.. -I%SRC_DIR%\third_party\pthreadpool\include -I%SRC_DIR%\third_party\cpuinfo\include -I%SRC_DIR%\third_party\fbgemm\include -I%SRC_DIR%\third_party\fbgemm -I%SRC_DIR%\third_party\fbgemm\third_party\asmjit\src -I%SRC_DIR%\third_party\ittapi\src\ittnotify -I%SRC_DIR%\third_party\FP16\include -I%SRC_DIR%\third_party\fmt\include -I%SRC_DIR%\build\third_party\ideep\mkl-dnn\include -I%SRC_DIR%\third_party\ideep\mkl-dnn\src\..\include -I%SRC_DIR%\third_party\flatbuffers\include -external:I%SRC_DIR%\build\third_party\gloo -external:I%SRC_DIR%\cmake\..\third_party\gloo -external:I%SRC_DIR%\third_party\protobuf\src -external:I%SRC_DIR%\third_party\XNNPACK\include -external:I%SRC_DIR%\third_party\ittapi\include -external:I%SRC_DIR%\cmake\..\third_party\eigen -external:I%SRC_DIR%\third_party\ideep\mkl-dnn\include\oneapi\dnnl -external:I%SRC_DIR%\third_party\ideep\include -external:I%SRC_DIR%\caffe2 -external:W0 /DWIN32 /D_WINDOWS /GR /EHsc /bigobj /FS -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /utf-8 /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273 -DHAVE_AVX512_CPU_DEFINITION -DHAVE_AVX2_CPU_DEFINITION /O2 /Ob2 /DNDEBUG /bigobj -DNDEBUG -std:c++17 -MD -DCAFFE2_USE_GLOO -DTH_HAVE_THREAD /EHsc /bigobj -O2 -DONNX_BUILD_MAIN_LIB -openmp:experimental /showIncludes /Focaffe2\CMakeFiles\torch_cpu.dir\__\torch\csrc\jit\runtime\static\te_wrapper.cpp.obj /Fdcaffe2\CMakeFiles\torch_cpu.dir\ /FS -c %SRC_DIR%\torch\csrc\jit\runtime\static\te_wrapper.cpp
%SRC_DIR%\torch\csrc\jit\runtime\static\te_wrapper.cpp : fatal error C1085: Cannot write compiler generated file: '%SRC_DIR%\build\caffe2\CMakeFiles\torch_cpu.dir\__\torch\csrc\jit\runtime\static\te_wrapper.cpp.obj': No space left on device

What would be the most idiomatic way to solve this issue?

@weiji14
Copy link
Member

weiji14 commented Apr 6, 2024

Try following https://conda-forge.org/docs/maintainer/conda_forge_yml/#azure to clear some disk space. Set this in conda-forge.yml

azure:
  free_disk_space: true

and then rerender the feedstock.

@Tobias-Fischer
Copy link
Contributor

Tobias-Fischer commented Apr 6, 2024

I think there’s little we can do - the Azure free disk space setting is already enabled. I’d try and see if these build locally. Perhaps there is a way to use the Quansight servers for Windows as well, the same way they are used for Linux builds? If not, I guess if there are some volunteers to build these locally then this would be an option - I did that for aarch64 for a while for qt. Conda-forge has a windows server too, but disk space has always been quite restricted there too so it might be a bit of a pain.

@jakirkham
Copy link
Member

Perhaps cross-compiling Windows from Linux is worth trying? Here is a different feedstock PR that does this ( conda-forge/polars-feedstock#187 )

If we were to use Quansight resources for Windows, being able to run the build on Linux (so cross-compiling) would be very helpful

@baszalmstra
Copy link
Member Author

Try following conda-forge.org/docs/maintainer/conda_forge_yml/#azure to clear some disk space. Set this in conda-forge.yml

azure:
  free_disk_space: true

Sadly thats already set:

free_disk_space: true

I think there’s little we can do - the Azure free disk space setting is already enabled. I’d try and see if these build locally. Perhaps there is a way to use the Quantstack servers for Windows as well, the same way they are used for Linux builds?

I assume you mean the runners provided through open-gpu-server by Quantsight and MetroStar? This PR only build the cpu-only version but if we also start building for Cuda I think this is the only possible way forward (let alone for other related repositories like tensorflow). However, the open-gpu-servers don't seem to provide any Windows images. Do you know who I should contact to get the ball rolling?

If not, I guess if there are some volunteers to build these locally then this would be an option

That would be an option but Id prefer to automate and open-source things as much as possible. Having something hooked up to this repository would be ideal.

Perhaps cross-compiling Windows from Linux is worth trying?

The native code of the example you linked is using Rust which makes this much easier. I doubt that this would be easy to achieve with pytorch.

@baszalmstra
Copy link
Member Author

I also expect another error when actual linking starts. On my local machine that takes at least 16GB of memory. The cuda version will mostly require more.

@jakirkham
Copy link
Member

Perhaps cross-compiling Windows from Linux is worth trying?

The native code of the example you linked is using Rust which makes this much easier. I doubt that this would be easy to achieve with pytorch.

If we don't try, we won't know

@baszalmstra
Copy link
Member Author

If we don't try, we won't know

Although that is technically true, its already hard enough to build pytorch natively. Adding cross-compilation in the mix seems to me to complicate this even further. Id much rather first focus on getting native builds working. Even if we need to modify the infrastructure to do so. I think having the ability to do resource intensive windows builds would be a huge benefit for the conda-forge ecosystem in general.

However, if all else fails cross-compiling seems like a worthwhile avenue to explore.

@bkpoon
Copy link
Member

bkpoon commented Apr 6, 2024

One thing to try is to move the build from D:\ to a directory that you have write access to on C:\. I have done this on a personal feedstock where I needed much more disk space. You can modify your conda-forge.yml file with

azure:
  settings_win:
    variables:
      CONDA_BLD_PATH: C:\\Miniconda\\envs\\

You should have roughly 70 GB free on C:\.

@baszalmstra
Copy link
Member Author

Thanks! I added that to the PR. I quickly searched github and it seems c:\bld\ is used more often so I tried that.

@bkpoon
Copy link
Member

bkpoon commented Apr 6, 2024

Just make sure that the directory exists and is writeable. Also, you need to rerender for the variable to be set. This comment should trigger the bot.

@conda-forge-admin, please rerender​

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Apr 6, 2024

This PR only build the cpu-only version but if we also start building for Cuda I think this is the only possible way forward (let alone for other related repositories like tensorflow). However, the open-gpu-servers don't seem to provide any Windows images. Do you know who I should contact to get the ball rolling?

A bit of history. Back when this feedstock was created 6 years ago, the pytorch officially suggested that people install two distinct packages pytorch-cpu or pytorch-gpu. Therefore it felt appropriate to create pytorch-cpu package because it would throw an error for those trying to install pytorch-gpu. These instructions have changed upstream.

I personally feel like for windows users, we would HURT their experience to not have a GPU package in 2024.

@baszalmstra
Copy link
Member Author

I personally feel like for windows users, we would HURT their experience to not have a GPU package in 2024.

Couldnt agree more. I started with CPU only to be able to make incremental progression. My goal is definitely to be able to build the cuda version too!

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Apr 9, 2024

well few things:

  1. I might try to build locally.
  2. After locally works for 1 python, I might try to enable the mega builds. When you build locally, it saves all the pytorch library compilation and makes compilation take "1.2x" time instead of "4x" time due to the repeated compilaiton of the library for each python version.
  3. Try to enable cuda using the CI.

Typically we "stop" the compilation on the CIs when we reach your stage (seems like it is working OK enough...).

@Tobias-Fischer
Copy link
Contributor

Hi @baszalmstra @hmaarrfk - do you have any updates on this? It would be amazing to see this happen :)!

@baszalmstra
Copy link
Member Author

@Tobias-Fischer Im still working on the Cuda builds but its a slow process because it takes ages to build them locally so iteration times are suuuper slow.

In parallel we are also looking into getting large Windows runners into the conda-forge infrastructure.

@baszalmstra
Copy link
Member Author

Small update:

image

I have something compiling locally. Still lots of issues (like Windows builds of pytorch 2.1.2 dont compile with python 3.12) but making steady progress. Currently getting megabuilds to work. Will push when I have something reliably working.

@baszalmstra
Copy link
Member Author

baszalmstra commented May 12, 2024

I got to the testing stage and noticed this:

- OMP_NUM_THREADS=4 python ./test/run_test.py || true # [not win and not (aarch64 and cuda_compiler_version != "None")]

However this seems to always fail with (this is from the logs of the latest release):

Ignoring disabled issues:  ['']
Unable to import boto3. Will not be emitting metrics.... Reason: No module named 'boto3'
Missing pip dependency: pytest-rerunfailures, please run `pip install -r .ci/docker/requirements-ci.txt`

Some dependencies are missing. Particularly:

  • pytest-rerunfailures
  • pytest-shard (not on conda-forge)
  • pytest-flakefinder (not on conda-forge)
  • pytest-xdist

(as can be seen here https://github.com/pytorch/pytorch/blob/6c8c5ad5eaf47a62fafbb4a2747198cbffbf1ff0/test/run_test.py#L1705)

Given that the test is allowed to fail (due to || true). Should we just remove it? Or put in the effort to fix these tests?

@h-vetinari
Copy link
Member

Given that the test is allowed to fail (due to || true). Should we just remove it? Or put in the effort to fix these tests?

The more we fix, the better. If it's really a lot of failures, we might not fix it right away (though depending on the severity of the failures, we might want to think twice about releasing something in that state).

In any case, let's leave the testing in, add the required dependencies, and pick up as many fixes as we can.

@h-vetinari
Copy link
Member

FWIW, cuda builds seem to fail with

C:\bld\libtorch_1720774565902\_h_env\Library\include\cuda_runtime.h(83): fatal error C1083: Cannot open include file: 'crt/host_config.h': No such file or directory

@baszalmstra
Copy link
Member Author

FWIW, cuda builds seem to fail with

C:\bld\libtorch_1720774565902\_h_env\Library\include\cuda_runtime.h(83): fatal error C1083: Cannot open include file: 'crt/host_config.h': No such file or directory

@h-vetinari

This file is added by cuda-nvcc-dev_win-64 but it's only added in the build environment. cuda-crt also includes this file (which is why I added it to the host section). cuda-crt however, seems to only be available from cuda 12.2. Would it be appropriate to depend on cuda-nvcc-dev_win-64 (e.g. compiler(cuda)) in the host section for lower cuda versions (<12.2)? And use cuda-crt for >=12.2? (Also, do you know what exactly this package is? 😅 )

@baszalmstra
Copy link
Member Author

Ah it appears the nvcc activation script sets the CUDA_CFLAGS environment variables which include the build prefix include directory, I might be able to use that somehow..

@baszalmstra
Copy link
Member Author

I think I figured out the issue: conda-forge/cuda-nvcc-feedstock#47

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe/meta.yaml) and found it was in an excellent condition.

@h-vetinari
Copy link
Member

Would it be appropriate to depend on cuda-nvcc-dev_win-64 (e.g. compiler(cuda)) in the host section for lower cuda versions (<12.2)? And use cuda-crt for >=12.2?

I'm not necessarily the best question to ask about the CUDA stack - I think the issue you opened is a good start (which'll probably have to be fixed in any case).

On the other hand, if something doesn't work for 12.0 but is available for later CUDA versions, I don't think it would be an issue to increase the cuda_compiler_version here (in CBC; for windows only), especially as we're trying to unlock one of the trickiest builds here. AFAIU from conda-forge/conda-forge-pinning-feedstock#5613, 12.4 is ready to go, for example

@baszalmstra
Copy link
Member Author

@h-vetinari

Yeah, I think the issue I found is indeed the problem. I mimicked the behavior of the Linux activation script to a degree and I was able to successfully build win_64_blas_implmklcuda_compilercuda-nvcccuda_compiler_version12.0 locally.

Let's see what CI here does. 😄

@Tobias-Fischer
Copy link
Contributor

Timing out after 12 hours, how fun …

Thanks for pushing this @baszalmstra!

@baszalmstra
Copy link
Member Author

I bumped the specs of the runners, let's see what happens.

@baszalmstra
Copy link
Member Author

baszalmstra commented Jul 25, 2024

The CUDA builds succeeded! 🥳🎈

Here are some of the things I want to do before I consider this PR ready:

  • Fix the annoying flaky CPU CI failure with pytorch11.lib missing.
  • Reenable and run tests (but like unix ignore failure).
  • Readd tests for library files.
  • Check how we can use conda-forges protobuf.

And finally:

  • Reenable all targets again

Anything Im missing? @hmaarrfk @h-vetinari

@h-vetinari
Copy link
Member

Huge congratulations on this milestone! 👏 🚀 🥳

Your todo-list sounds good to me. 👍

Sidenote: pytorch 2.4.0 just got released, depending on how much time you'll still need, we might want to build that first (at least, that shouldn't cause a lot of merge conflicts...).

@baszalmstra
Copy link
Member Author

Sidenote: pytorch 2.4.0 just got released, depending on how much time you'll still need, we might want to build that first (at least, that shouldn't cause a lot of merge conflicts...).

Yeah, just go ahead and build that first. I don't expect any merge conflicts, and otherwise Ill solve them.

@baszalmstra
Copy link
Member Author

Looks like starting with pytorch 2.4.1 it supports split (python and non-python) builds out of the box.

That will hopefully also help with our recipe here!

pytorch/pytorch#126328

Comment on lines +147 to +148
@REM Clear the build from any remaining artifacts. We use sccache to avoid recompiling similar code.
cmake --build build --target clean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you cleaning here? Unix builds do not.

Copy link
Member Author

@baszalmstra baszalmstra Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The reason is that we get into a weird state on Windows where apparently the order of the variant builds for pytorch influences the compilation process. I noticed that this order is non-deterministic. We would run into a weird issue where if pytorch for python 3.11 was built before 3.12 you will get linker errors for the 3.12 build. Whereas if 3.12 would be built before 3.11 all would be good.

The last two weeks I have been trying to figure out why this happens and why this specifically only happens for Python 3.11 and 3.12 but I couldnt figure it out. As a last resort, I opted to just remove all artifacts and rebuild them. Using sccache to ensure we aren't rebuilding everything from scratch.

This is a little bit of a temporary workaround. As far as I understand with pytorch 2.4.1 split builds will be supported out of the box and all of the megabuild stuff will become a lot simpler.

If you happen to have a better idea though, I welcome it with open arms! 😄

Copy link
Member

@isuruf isuruf Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a log with the linker errors?

As far as I understand with pytorch/pytorch#126328 and all of the megabuild stuff will become a lot simpler.

It probably won't be ready for use until 2.5.0. Interesting note: split build is a result of pytorch developers wanting to replicate the libtorch split of conda-forge in wheels.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes the last time this failed in CI was here: https://github.com/conda-forge/pytorch-cpu-feedstock/actions/runs/10077183574/job/27859126386?pr=231

And interesting indeed! That's great. 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you print build/CMakeCache.txt ?

@baszalmstra
Copy link
Member Author

Omg its green!

I will look at using the shared protobuf binary from conda-forge during the next few days.

@baszalmstra
Copy link
Member Author

THe issue Im facing with protobuf is this:

FAILED: bin/torch_cpu.dll lib/torch_cpu.lib
  C:\Windows\system32\cmd.exe /C "cd . && C:\Users\zalms\projects\pytorch-cpu-feedstock\output\_build_env\Library\bin\cmake.exe -E vs_link_dll --intdir=caffe2\CMakeFiles\torch_cpu.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\mt.exe --manifests  -- C:\PROGRA~2\MICROS~2\2022\BUILDT~1\VC\Tools\MSVC\1429~1.301\bin\HostX64\x64\link.exe /nologo @CMakeFiles\torch_cpu.rsp  /out:bin\torch_cpu.dll /implib:lib\torch_cpu.lib /pdb:bin\torch_cpu.pdb /dll /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO  -WHOLEARCHIVE:C:/Users/zalms/projects/pytorch-cpu-feedstock/output/work/build/lib/caffe2_protos.lib -WHOLEARCHIVE:C:/Users/zalms/projects/pytorch-cpu-feedstock/output/work/build/lib/onnx.lib && cd ."
  LINK: command "C:\PROGRA~2\MICROS~2\2022\BUILDT~1\VC\Tools\MSVC\1429~1.301\bin\HostX64\x64\link.exe /nologo @CMakeFiles\torch_cpu.rsp /out:bin\torch_cpu.dll /implib:lib\torch_cpu.lib /pdb:bin\torch_cpu.pdb /dll /version:0.0 /machine:x64 /ignore:4049 /ignore:4217 /ignore:4099 /INCREMENTAL:NO -WHOLEARCHIVE:C:/Users/zalms/projects/pytorch-cpu-feedstock/output/work/build/lib/caffe2_protos.lib -WHOLEARCHIVE:C:/Users/zalms/projects/pytorch-cpu-feedstock/output/work/build/lib/onnx.lib /MANIFEST:EMBED,ID=2" failed (exit code 1120) with the following output:
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: void * __cdecl google::protobuf::Arena::AllocateAligned(unsigned __int64,unsigned __int64)" (?AllocateAligned@Arena@protobuf@google@@QEAAPEAX_K0@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "private: void __cdecl google::protobuf::Arena::ReturnArrayMemory(void *,unsigned __int64)" (?ReturnArrayMemory@Arena@protobuf@google@@AEAAXPEAX_K@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "private: void * __cdecl google::protobuf::Arena::AllocateAlignedForArray(unsigned __int64,unsigned __int64)" (?AllocateAlignedForArray@Arena@protobuf@google@@AEAAPEAX_K0@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: int __cdecl google::protobuf::internal::CachedSize::Get(void)const " (?Get@CachedSize@internal@protobuf@google@@QEBAHXZ) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: class google::protobuf::Arena * __cdecl google::protobuf::MessageLite::GetArena(void)const " (?GetArena@MessageLite@protobuf@google@@QEBAPEAVArena@23@XZ) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "protected: void * const * __cdecl google::protobuf::internal::RepeatedPtrFieldBase::raw_data(void)const " (?raw_data@RepeatedPtrFieldBase@internal@protobuf@google@@IEBAPEBQEAXXZ) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "private: int __cdecl google::protobuf::internal::RepeatedPtrFieldBase::ExchangeCurrentSize(int)" (?ExchangeCurrentSize@RepeatedPtrFieldBase@internal@protobuf@google@@AEAAHH@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "private: void * * __cdecl google::protobuf::internal::RepeatedPtrFieldBase::elements(void)" (?elements@RepeatedPtrFieldBase@internal@protobuf@google@@AEAAPEAPEAXXZ) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "private: void * & __cdecl google::protobuf::internal::RepeatedPtrFieldBase::element_at(int)" (?element_at@RepeatedPtrFieldBase@internal@protobuf@google@@AEAAAEAPEAXH@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "private: void const * __cdecl google::protobuf::internal::RepeatedPtrFieldBase::element_at(int)const " (?element_at@RepeatedPtrFieldBase@internal@protobuf@google@@AEBAPEBXH@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "private: int __cdecl google::protobuf::internal::RepeatedPtrFieldBase::allocated_size(void)const " (?allocated_size@RepeatedPtrFieldBase@internal@protobuf@google@@AEBAHXZ) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "private: struct google::protobuf::internal::RepeatedPtrFieldBase::Rep * __cdecl google::protobuf::internal::RepeatedPtrFieldBase::rep(void)" (?rep@RepeatedPtrFieldBase@internal@protobuf@google@@AEAAPEAURep@1234@XZ) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "private: void __cdecl google::protobuf::internal::RepeatedPtrFieldBase::MaybeExtend(void)" (?MaybeExtend@RepeatedPtrFieldBase@internal@protobuf@google@@AEAAXXZ) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "protected: __cdecl google::protobuf::internal::RepeatedPtrFieldBase::~RepeatedPtrFieldBase(void)" (??1RepeatedPtrFieldBase@internal@protobuf@google@@IEAA@XZ) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: void __cdecl google::protobuf::internal::RepeatedPtrFieldBase::InternalSwap(class google::protobuf::internal::RepeatedPtrFieldBase *)" (?InternalSwap@RepeatedPtrFieldBase@internal@protobuf@google@@QEAAXPEAV1234@@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > * __cdecl google::protobuf::internal::TaggedStringPtr::Get(void)const " (?Get@TaggedStringPtr@internal@protobuf@google@@QEBAPEAV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@XZ) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "protected: bool __cdecl google::protobuf::internal::RepeatedPtrFieldBase::NeedsDestroy(void)const " (?NeedsDestroy@RepeatedPtrFieldBase@internal@protobuf@google@@IEBA_NXZ) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: unsigned char * __cdecl google::protobuf::io::EpsCopyOutputStream::WriteRaw(void const *,int,unsigned char *)" (?WriteRaw@EpsCopyOutputStream@io@protobuf@google@@QEAAPEAEPEBXHPEAE@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: unsigned char * __cdecl google::protobuf::io::EpsCopyOutputStream::WriteStringMaybeAliased(unsigned int,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &,unsigned char *)" (?WriteStringMaybeAliased@EpsCopyOutputStream@io@protobuf@google@@QEAAPEAEIAEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@PEAE@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: unsigned char * __cdecl google::protobuf::io::EpsCopyOutputStream::WriteBytesMaybeAliased(unsigned int,class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &,unsigned char *)" (?WriteBytesMaybeAliased@EpsCopyOutputStream@io@protobuf@google@@QEAAPEAEIAEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@PEAE@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "private: static unsigned __int64 __cdecl google::protobuf::io::EpsCopyOutputStream::Encode64(unsigned __int64)" (?Encode64@EpsCopyOutputStream@io@protobuf@google@@CA_K_K@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: static unsigned char * __cdecl google::protobuf::io::CodedOutputStream::WriteVarint32ToArray(unsigned int,unsigned char *)" (?WriteVarint32ToArray@CodedOutputStream@io@protobuf@google@@SAPEAEIPEAE@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: static unsigned char * __cdecl google::protobuf::io::CodedOutputStream::WriteVarint64ToArray(unsigned __int64,unsigned char *)" (?WriteVarint64ToArray@CodedOutputStream@io@protobuf@google@@SAPEAE_KPEAE@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: static unsigned char * __cdecl google::protobuf::io::CodedOutputStream::WriteVarint32SignExtendedToArray(int,unsigned char *)" (?WriteVarint32SignExtendedToArray@CodedOutputStream@io@protobuf@google@@SAPEAEHPEAE@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: __cdecl google::protobuf::internal::ArenaStringPtr::ArenaStringPtr(class google::protobuf::Arena *)" (??0ArenaStringPtr@internal@protobuf@google@@QEAA@PEAVArena@23@@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: __cdecl google::protobuf::internal::ArenaStringPtr::ArenaStringPtr(class google::protobuf::Arena *,struct google::protobuf::internal::ArenaStringPtr const &)" (??0ArenaStringPtr@internal@protobuf@google@@QEAA@PEAVArena@23@AEBU0123@@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: void __cdecl google::protobuf::internal::ArenaStringPtr::ClearNonDefaultToEmpty(void)" (?ClearNonDefaultToEmpty@ArenaStringPtr@internal@protobuf@google@@QEAAXXZ) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: __cdecl google::protobuf::internal::CachedSize::CachedSize(int)" (??0CachedSize@internal@protobuf@google@@QEAA@H@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: void __cdecl google::protobuf::internal::CachedSize::Set(int)" (?Set@CachedSize@internal@protobuf@google@@QEAAXH@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "protected: static class google::protobuf::internal::InternalVisibility __cdecl google::protobuf::MessageLite::internal_visibility(void)" (?internal_visibility@MessageLite@protobuf@google@@KA?AVInternalVisibility@internal@23@XZ) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: static unsigned int __cdecl google::protobuf::internal::WireFormatLite::EncodeFloat(float)" (?EncodeFloat@WireFormatLite@internal@protobuf@google@@SAIM@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: static unsigned __int64 __cdecl google::protobuf::internal::WireFormatLite::Int32Size(int)" (?Int32Size@WireFormatLite@internal@protobuf@google@@SA_KH@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: static unsigned __int64 __cdecl google::protobuf::internal::WireFormatLite::EnumSize(int)" (?EnumSize@WireFormatLite@internal@protobuf@google@@SA_KH@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: static unsigned __int64 __cdecl google::protobuf::internal::WireFormatLite::Int32SizePlusOne(int)" (?Int32SizePlusOne@WireFormatLite@internal@protobuf@google@@SA_KH@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: static unsigned __int64 __cdecl google::protobuf::internal::WireFormatLite::Int64SizePlusOne(__int64)" (?Int64SizePlusOne@WireFormatLite@internal@protobuf@google@@SA_K_J@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: static unsigned __int64 __cdecl google::protobuf::internal::WireFormatLite::StringSize(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &)" (?StringSize@WireFormatLite@internal@protobuf@google@@SA_KAEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: static unsigned __int64 __cdecl google::protobuf::internal::WireFormatLite::BytesSize(class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > const &)" (?BytesSize@WireFormatLite@internal@protobuf@google@@SA_KAEBV?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: static unsigned __int64 __cdecl google::protobuf::internal::WireFormatLite::LengthDelimitedSize(unsigned __int64)" (?LengthDelimitedSize@WireFormatLite@internal@protobuf@google@@SA_K_K@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: __cdecl google::protobuf::Message::Message(void)" (??0Message@protobuf@google@@QEAA@XZ) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "protected: __cdecl google::protobuf::Message::Message(class google::protobuf::Arena *)" (??0Message@protobuf@google@@IEAA@PEAVArena@12@@Z) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
  libprotobuf.lib(libprotobuf.dll) : error LNK2005: "public: virtual __cdecl google::protobuf::Message::~Message(void)" (??1Message@protobuf@google@@UEAA@XZ) already defined in caffe2_protos.lib(caffe2.pb.cc.obj)
     Creating library lib\torch_cpu.lib and object lib\torch_cpu.exp
  LINK : warning LNK4098: defaultlib 'LIBCMT' conflicts with use of other libs; use /NODEFAULTLIB:library
  caffe2_protos.lib(caffe2.pb.cc.obj) : error LNK2019: unresolved external symbol "private: static struct google::protobuf::internal::ThreadSafeArena::ThreadCache google::protobuf::internal::ThreadSafeArena::thread_cache_" (?thread_cache_@ThreadSafeArena@internal@protobuf@google@@0UThreadCache@1234@A) referenced in function "private: bool __cdecl google::protobuf::internal::ThreadSafeArena::GetSerialArenaFast(class google::protobuf::internal::SerialArena * *)" (?GetSerialArenaFast@ThreadSafeArena@internal@protobuf@google@@AEAA_NPEAPEAVSerialArena@234@@Z)
  caffe2_protos.lib(torch.pb.cc.obj) : error LNK2001: unresolved external symbol "private: static struct google::protobuf::internal::ThreadSafeArena::ThreadCache google::protobuf::internal::ThreadSafeArena::thread_cache_" (?thread_cache_@ThreadSafeArena@internal@protobuf@google@@0UThreadCache@1234@A)
  caffe2_protos.lib(caffe2.pb.cc.obj) : error LNK2001: unresolved external symbol "class google::protobuf::internal::ExplicitlyConstructed<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,8> google::protobuf::internal::fixed_address_empty_string" (?fixed_address_empty_string@internal@protobuf@google@@3V?$ExplicitlyConstructed@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@$07@123@A)
  caffe2_protos.lib(torch.pb.cc.obj) : error LNK2001: unresolved external symbol "class google::protobuf::internal::ExplicitlyConstructed<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> >,8> google::protobuf::internal::fixed_address_empty_string" (?fixed_address_empty_string@internal@protobuf@google@@3V?$ExplicitlyConstructed@V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@$07@123@A)
  caffe2_protos.lib(caffe2.pb.cc.obj) : error LNK2001: unresolved external symbol "protected: static struct google::protobuf::MessageLite::DescriptorMethods const google::protobuf::Message::kDescriptorMethods" (?kDescriptorMethods@Message@protobuf@google@@1UDescriptorMethods@MessageLite@23@B)
  caffe2_protos.lib(torch.pb.cc.obj) : error LNK2001: unresolved external symbol "protected: static struct google::protobuf::MessageLite::DescriptorMethods const google::protobuf::Message::kDescriptorMethods" (?kDescriptorMethods@Message@protobuf@google@@1UDescriptorMethods@MessageLite@23@B)

  bin\torch_cpu.dll : fatal error LNK1120: 3 unresolved externals

It looks like caffe2_protos exports the protobuf symbols. Any thoughts on how to fix this @h-vetinari ?

@h-vetinari
Copy link
Member

I don't see libprotobuf on the linker line anywhere? Perhaps we need to link it explicitly? Also might be necessary to define the symbol PROTOBUF_USE_DLLS depending on how protobuf is included.

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 6, 2024

Sorry for pushing ahead with so many version updates.

I think the build system has "improved" even if some changes required "fixing" from our part.

You might want to update to 2.4.1 or perhaps even to the 2.5.0 pre-releases there seems to be rc9 at least.

@baszalmstra
Copy link
Member Author

Ill give 2.5 a go, I believe the split build is now also part of that?

@hmaarrfk
Copy link
Contributor

hmaarrfk commented Oct 6, 2024

Ill give 2.5 a go, I believe the split build is now also part of that?

Look at the source, i thought that it was part of 2.4. But I could be wrong. the split build uses a different strategy than we have historically used at conda-forge.

If you want to make use of it, I would first prototype the split build separately from the windows linking issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Windows builds
10 participants