Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a segmentation fault bug when used with DPDK. #429

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

boat0
Copy link

@boat0 boat0 commented Mar 28, 2018

When DPDK is used we have:
lkl_hijack_netdev_create()
--> lkl_netdev_dpdk_create()
--> rte_eal_init()
--> rte_eal_intr_init()

rte_eal_intr_init() creates threads to use lkl_ops, but lkl_ops will only
be initialized in later call to lkl_start_kernel(). So LKL crashes with
SIGSEGV received.

Fix by switching calls to lkl_hijack_netdev_create()/lkl_start_kernel().

Signed-off-by: Xiaozhou Liu [email protected]


This change is Reviewable

When DPDK is used we have:
    lkl_hijack_netdev_create()
      --> lkl_netdev_dpdk_create()
        --> rte_eal_init()
          --> rte_eal_intr_init()

rte_eal_intr_init() creates threads to use lkl_ops, but lkl_ops will only
be initialized in later call to lkl_start_kernel(). So LKL crashes with
SIGSEGV received.

Fix by switching calls to lkl_hijack_netdev_create()/lkl_start_kernel().

Signed-off-by: Xiaozhou Liu <[email protected]>
@lkl-jenkins
Copy link

Can one of the admins verify this patch?

@tavip tavip requested a review from thehajime March 29, 2018 13:19
@thehajime
Copy link
Member

Thanks for the patch !
I will give a review later (sorry for the late response).

btw, what dpdk version are you using in the case of crash ?

@boat0
Copy link
Author

boat0 commented Apr 3, 2018

I am using the default dpdk version-17.02 that comes with lkl.

@thehajime
Copy link
Member

@boat0 sorry for the late response.

lkl_hijack_netdev_create() before lkl_start_kernel() was meant to put kernel boot parameter (below) before lkl_start_kernel. looks like LKL can add virtio devices after lkl_running=1 (hungs in my local machine though), but the original code should work fine.

[    0.000000] Kernel command line:  virtio_mmio.device=268@0x1000000:1

I can reproduce segfault and found that the crash is not due to pthread_create in rte_eal_intr_init, but due to the commit 8f6b03d, which I did :(

commit 8f6b03d865d0fac79e9d9e011196e62df070bdce
Author: Hajime Tazaki <[email protected]>
Date:   Thu Feb 8 08:09:40 2018 +0900

    lkl: epoll_wait demultiplex between host and lkl

Applying the following patch seems to fix the issue, but I need to think more about what's the right way to do this.

diff --git a/tools/lkl/lib/hijack/hijack.c b/tools/lkl/lib/hijack/hijack.c
index 95bbcf7f549e..664f2018407d 100644
--- a/tools/lkl/lib/hijack/hijack.c
+++ b/tools/lkl/lib/hijack/hijack.c
@@ -421,6 +421,10 @@ int epoll_wait(int epfd, struct epoll_event *events,
        void *trv_val;
        int i, ret, ret_lkl, ret_host;
 
+       if (!lkl_running){
+               return host_epoll_wait(epfd, events, maxevents, timeout);
+       }
+
        ret = lkl_call(__lkl__NR_pipe, 1, l_pipe);
        if (ret == -1) {
                fprintf(stderr, "lkl pipe error(errno=%d)\n", errno);

Since I'm not yet able to run an app with dpdk, could you try this patch in your machine to see if it works or not ?

thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants