Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipe blocking when input too large? (Mac OS) #218

Open
andreasabel opened this issue Oct 13, 2021 · 3 comments
Open

Pipe blocking when input too large? (Mac OS) #218

andreasabel opened this issue Oct 13, 2021 · 3 comments

Comments

@andreasabel
Copy link
Member

andreasabel commented Oct 13, 2021

In simulating (cat | less) using createProcess and CreatePipe, I am experiencing a weird threshold on the input to cat. This is shrunk from a real world problem.

import System.IO
import System.Process

main = do

  (Just inp, Just out, _, ph1)  <- createProcess $
    (proc "cat" [])
      { std_in  = CreatePipe
      , std_out = CreatePipe
      }
  -- Freezes when 'good' is replaced with 'bad'  (1 line `a 64 bytes more)
  hPutStr inp good
  hClose inp

  (_, _, _, ph2) <- createProcess $
    (proc "less" [])
      { std_in = UseHandle out }

  waitForProcess ph1
  waitForProcess ph2
  putStrLn "Program terminated successfully."

  where
  good  = unlines $ replicate 2304 $ replicate 63 'A'
  bad   = unlines $ replicate 2305 $ replicate 63 'A'

Using the good input, less shows up presenting me the 2304 lines of 63 As each.
Using the bad input, nothing shows up, and I can only Ctrl-C.

This is on Mac OS Mojave with GHC 9.0.1 and latest process (1.6.13.2).

In my original setting, the pipe was nroff -man /dev/stdin | less and the threshold was exactly 192kb (192 * 1024 bytes).

Note that there is no problem if I let the OS do the piping (using shell "cat | less"):

main = do

  (Just inp, _, _, p) <- createProcess $
    (shell "cat | less")
      { std_in = CreatePipe }
  hPutStr inp bad
  hClose inp

  waitForProcess p
  putStrLn "Program terminated successfully."
@snoyberg
Copy link
Collaborator

The library is behaving correctly in this case. hPutStr is filling up an OS buffer, and blocking when it's full. If the amount of data is small enough, the blocking never occurs and the second process can start. Once the data is large enough to fill the buffer, we block before less can be spawned and cannot drain the pipe.

When you use shell, the shell itself handles the asynchronous aspect of things.

@andreasabel
Copy link
Member Author

andreasabel commented Oct 13, 2021

Thanks for the explanation! Makes perfect sense now.

Consequently, here is the correct implementation of the pipe, which indeed works as expected:

main = do

  (Just inp, Just out, _, ph1)  <- createProcess $
    (proc "cat" [])
      { std_in  = CreatePipe
      , std_out = CreatePipe
      }
  (_, _, _, ph2) <- createProcess $
    (proc "less" [])
      { std_in = UseHandle out }

  -- After the pipe is set up, we can pour the input without blocking:
  hPutStr inp longInput
  hClose inp

  waitForProcess ph1
  waitForProcess ph2
  putStrLn "Program terminated successfully."

  where
  longInput = unlines $ replicate 10000 $ replicate 63 'A'

Alternatively, writing to inp can be done concurrently to not block the creation of the pipe:

import Control.Concurrent.Async
import System.IO
import System.Process

main = do

  (Just inp, Just out, _, ph1)  <- createProcess $
    (proc "cat" [])
      { std_in  = CreatePipe
      , std_out = CreatePipe
      }

  withAsync (hPutStr inp longInput >> hClose inp) $ \_ -> do

    (_, _, _, ph2) <- createProcess $
      (proc "less" [])
        { std_in = UseHandle out }

    waitForProcess ph1
    waitForProcess ph2
    putStrLn "Program terminated successfully."

  where
  longInput = unlines $ replicate 10000 $ replicate 63 'A'

I wonder whether this example could be added as tutorial somewhere to the documentation (I can do this if it is welcome). Previously, I had googled a lot but did not find enough examples for CreatePipe.

@snoyberg
Copy link
Collaborator

I’m certainly up for such a doc addition. At the very least, a “make sure you don’t deadlock by filing buffers” with a link to this issue would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants