Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement downloads #416

Merged
merged 2 commits into from
Nov 8, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@
### Added
- `Ferrum::Page#disable_javascript` disables the JavaScript from the HTML source
- `Ferrum::Page#set_viewport` emulates the viewport
- `Ferrum::Downloads`
- `#files` information about downloaded files
- `#wait` wait for file download to be completed
- `#set_behavior` where and whether to store file

### Changed

Expand Down
2 changes: 1 addition & 1 deletion Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ gem "chunky_png", "~> 1.3"
gem "image_size", "~> 2.0"
gem "kramdown", "~> 2.0", require: false
gem "pdf-reader", "~> 2.2"
gem "puma", "~> 4.1"
gem "puma", ">= 5.6.7"
gem "rake", "~> 13.0"
gem "redcarpet", require: false, platform: :mri
gem "rspec", "~> 3.8"
Expand Down
85 changes: 65 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ based on Ferrum and Mechanize.
* [Navigation](https://github.com/rubycdp/ferrum#navigation)
* [Finders](https://github.com/rubycdp/ferrum#finders)
* [Screenshots](https://github.com/rubycdp/ferrum#screenshots)
* [Cleaning Up](https://github.com/rubycdp/ferrum#cleaning-up)
* [Network](https://github.com/rubycdp/ferrum#network)
* [Downloads](https://github.com/rubycdp/ferrum#downloads)
* [Proxy](https://github.com/rubycdp/ferrum#proxy)
* [Mouse](https://github.com/rubycdp/ferrum#mouse)
* [Keyboard](https://github.com/rubycdp/ferrum#keyboard)
Expand All @@ -49,6 +49,7 @@ based on Ferrum and Mechanize.
* [Animation](https://github.com/rubycdp/ferrum#animation)
* [Node](https://github.com/rubycdp/ferrum#node)
* [Tracing](https://github.com/rubycdp/ferrum#tracing)
* [Clean Up](https://github.com/rubycdp/ferrum#clean-up)
* [Thread safety](https://github.com/rubycdp/ferrum#thread-safety)
* [Development](https://github.com/rubycdp/ferrum#development)
* [Contributing](https://github.com/rubycdp/ferrum#contributing)
Expand Down Expand Up @@ -411,25 +412,6 @@ browser.mhtml(path: "google.mhtml") # => 87742
```


## Cleaning Up

#### reset

Closes browser tabs opened by the `Browser` instance.

```ruby
# connect to a long-running Chrome process
browser = Ferrum::Browser.new(url: 'http://localhost:9222')

browser.go_to("https://github.com/")

# clean up, lest the tab stays there hanging forever
browser.reset

browser.quit
```


## Network

`browser.network`
Expand Down Expand Up @@ -608,6 +590,50 @@ Toggles ignoring cache for each request. If true, cache will not be used.
browser.network.cache(disable: true)
```


## Downloads

`browser.downloads`

#### files `Array<Hash>`

Returns all information about downloaded files as a `Hash`.

```ruby
browser.go_to("http://localhost/attachment.pdf")
browser.downloads.files # => [{"frameId"=>"E3316DF1B5383D38F8ADF7485005FDE3", "guid"=>"11a68745-98ac-4d54-9b57-9f9016c268b3", "url"=>"http://localhost/attachment.pdf", "suggestedFilename"=>"attachment.pdf", "totalBytes"=>4911, "receivedBytes"=>4911, "state"=>"completed"}]
```

#### wait(timeout)

Waits until the download is finished.

```ruby
browser.go_to("http://localhost/attachment.pdf")
browser.downloads.wait
```

or

```ruby
browser.go_to("http://localhost/page")
browser.downloads.wait { browser.at_css("#download").click }
```

#### set_behavior(\*\*options)

Sets behavior in case of file to be downloaded.

* options `Hash`
* :save_path `String` absolute path of where to store the file
* :behavior `Symbol` `deny | allow | allowAndName | default`, `allow` by default

```ruby
browser.go_to("https://example.com/")
browser.downloads.set_behavior(save_path: "/tmp", behavior: :allow)
```


## Proxy

You can set a proxy with a `:proxy` option:
Expand Down Expand Up @@ -1210,6 +1236,25 @@ Accepts block, records trace and by default returns trace data from `Tracing.tra
only one trace config can be active at a time per browser.


## Clean Up

#### reset

Closes browser tabs opened by the `Browser` instance.

```ruby
# connect to a long-running Chrome process
browser = Ferrum::Browser.new(url: 'http://localhost:9222')

browser.go_to("https://github.com/")

# clean up, lest the tab stays there hanging forever
browser.reset

browser.quit
```


## Thread safety ##

Ferrum is fully thread-safe. You can create one browser or a few as you wish and
Expand Down
2 changes: 1 addition & 1 deletion lib/ferrum/browser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ class Browser
delegate %i[go_to goto go back forward refresh reload stop wait_for_reload
at_css at_xpath css xpath current_url current_title url title
body doctype content=
headers cookies network
headers cookies network downloads
mouse keyboard
screenshot pdf mhtml viewport_size device_pixel_ratio
frames frame_by main_frame
Expand Down
60 changes: 60 additions & 0 deletions lib/ferrum/downloads.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# frozen_string_literal: true

module Ferrum
class Downloads
VALID_BEHAVIOR = %i[deny allow allowAndName default].freeze

def initialize(page)
@page = page
@event = Event.new.tap(&:set)
@files = {}
end

def files
@files.values
end

def wait(timeout = 5)
@event.reset
yield if block_given?
@event.wait(timeout)
@event.set
end

def set_behavior(save_path:, behavior: :allow)
raise ArgumentError unless VALID_BEHAVIOR.include?(behavior.to_sym)
raise Error, "supply absolute path for `:save_path` option" unless Pathname.new(save_path.to_s).absolute?

@page.command("Browser.setDownloadBehavior",
browserContextId: @page.context.id,
downloadPath: save_path,
behavior: behavior,
eventsEnabled: true)
end

def subscribe
subscribe_download_will_begin
subscribe_download_progress
end

def subscribe_download_will_begin
@page.on("Browser.downloadWillBegin") do |params|
@event.reset
@files[params["guid"]] = params
end
end

def subscribe_download_progress
@page.on("Browser.downloadProgress") do |params|
@files[params["guid"]].merge!(params)

case params["state"]
when "completed", "canceled"
@event.set
else
@event.reset
end
end
end
end
end
17 changes: 17 additions & 0 deletions lib/ferrum/event.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# frozen_string_literal: true

module Ferrum
class Event < Concurrent::Event
def iteration
synchronize { @iteration }
end

def reset
synchronize do
@iteration += 1
@set = false if @set
@iteration
end
end
end
end
44 changes: 16 additions & 28 deletions lib/ferrum/page.rb
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
# frozen_string_literal: true

require "forwardable"
require "ferrum/event"
require "ferrum/mouse"
require "ferrum/keyboard"
require "ferrum/headers"
require "ferrum/cookies"
require "ferrum/dialog"
require "ferrum/network"
require "ferrum/downloads"
require "ferrum/page/frames"
require "ferrum/page/screenshot"
require "ferrum/page/animation"
Expand All @@ -18,20 +20,6 @@ module Ferrum
class Page
GOTO_WAIT = ENV.fetch("FERRUM_GOTO_WAIT", 0.1).to_f

class Event < Concurrent::Event
def iteration
synchronize { @iteration }
end

def reset
synchronize do
@iteration += 1
@set = false if @set
@iteration
end
end
end

extend Forwardable
delegate %i[at_css at_xpath css xpath
current_url current_title url title body doctype content=
Expand Down Expand Up @@ -71,6 +59,11 @@ def reset
# @return [Cookies]
attr_reader :cookies

# Downloads object.
#
# @return [Downloads]
attr_reader :downloads

def initialize(target_id, browser, proxy: nil)
@frames = Concurrent::Map.new
@main_frame = Frame.new(nil, self)
Expand All @@ -91,6 +84,7 @@ def initialize(target_id, browser, proxy: nil)
@cookies = Cookies.new(self)
@network = Network.new(self)
@tracing = Tracing.new(self)
@downloads = Downloads.new(self)

subscribe
prepare_page
Expand All @@ -114,8 +108,10 @@ def go_to(url = nil)
options = { url: combine_url!(url) }
options.merge!(referrer: referrer) if referrer
response = command("Page.navigate", wait: GOTO_WAIT, **options)
error_text = response["errorText"]
raise StatusError.new(options[:url], "Request to #{options[:url]} failed (#{error_text})") if error_text
error_text = response["errorText"] # https://cs.chromium.org/chromium/src/net/base/net_error_list.h
if error_text && error_text != "net::ERR_ABORTED" # Request aborted due to user action or download
raise StatusError.new(options[:url], "Request to #{options[:url]} failed (#{error_text})")
end

response["frameId"]
rescue TimeoutError
Expand Down Expand Up @@ -259,9 +255,9 @@ def forward
history_navigate(delta: 1)
end

def wait_for_reload(sec = 1)
def wait_for_reload(timeout = 1)
@event.reset if @event.set?
@event.wait(sec)
@event.wait(timeout)
@event.set
end

Expand Down Expand Up @@ -356,6 +352,7 @@ def document_node_id
def subscribe
frames_subscribe
network.subscribe
downloads.subscribe

if @browser.options.logger
on("Runtime.consoleAPICalled") do |params|
Expand Down Expand Up @@ -398,16 +395,7 @@ def prepare_page
end
end

if @browser.options.save_path
unless Pathname.new(@browser.options.save_path).absolute?
raise Error, "supply absolute path for `:save_path` option"
end

@browser.command("Browser.setDownloadBehavior",
browserContextId: context.id,
downloadPath: @browser.options.save_path,
behavior: "allow", eventsEnabled: true)
end
downloads.set_behavior(save_path: @browser.options.save_path) if @browser.options.save_path

@browser.extensions.each do |extension|
command("Page.addScriptToEvaluateOnNewDocument", source: extension)
Expand Down
1 change: 1 addition & 0 deletions spec/browser_spec.rb
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@
skip "https://bugs.chromium.org/p/chromium/issues/detail?id=1444729" if browser.headless_new?

browser.go_to("/#{filename}")
browser.downloads.wait

expect(File.exist?("#{save_path}/#{filename}")).to be true
ensure
Expand Down
Loading