class Mechanize

The Mechanize library is used for automating interactions with a website. It can follow links and submit forms. Form fields can be populated and submitted. A history of URLs is maintained and can be queried.

Example

require 'mechanize'
require 'logger'

agent = Mechanize.new
agent.log = Logger.new "mech.log"
agent.user_agent_alias = 'Mac Safari'

page = agent.get "http://www.google.com/"
search_form = page.form_with :name => "f"
search_form.field_with(:name => "q").value = "Hello"

search_results = agent.submit search_form
puts search_results.body

Issues with mechanize

If you think you have a bug with mechanize, but aren't sure, please file a ticket at github.com/sparklemotion/mechanize/issues

Here are some common problems you may experience with mechanize

Problems connecting to SSL sites

Mechanize defaults to validating SSL certificates using the default CA certificates for your platform. At this time, Windows users do not have integration between the OS default CA certificates and OpenSSL. cert_store explains how to download and use Mozilla's CA certificates to allow SSL sites to work.

Problems with content-length

Some sites return an incorrect content-length value. Unlike a browser, mechanize raises an error when the content-length header does not match the response length since it does not know if there was a connection problem or if the mismatch is a server bug.

The error raised, Mechanize::ResponseReadError, can be converted to a parsed Page, File, etc. depending upon the content-type:

agent = Mechanize.new
uri = URI 'http://example/invalid_content_length'

begin
  page = agent.get uri
rescue Mechanize::ResponseReadError => e
  page = e.force_parse
end

Constants

AGENT_ALIASES

Supported User-Agent aliases for use with user_agent_alias=. The description in parenthesis is for informative purposes and is not part of the alias name.

  • Linux Firefox (26.0 on Ubuntu Linux)

  • Linux Konqueror (3)

  • Linux Mozilla

  • Mac Firefox (26.0)

  • Mac Mozilla

  • Mac Safari (7.0.1 on OS X 10.9)

  • Mac Safari 4

  • Mechanize (default)

  • Windows IE 6

  • Windows IE 7

  • Windows IE 8

  • Windows IE 9

  • Windows IE 10 (Windows 8.1 64bit)

  • Windows IE 11 (Windows 8.1 64bit)

  • Windows Mozilla

  • Windows Firefox (26.0)

  • iPhone (iOS 7.0.4)

  • iPad (iOS 7.0.4)

  • Android (4.4.2)

Example:

agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'

Public Class Methods

new(connection_name = 'mechanize') { |self| ... } click to toggle source

Creates a new mechanize instance. If a block is given, the created instance is yielded to the block for setting up pre-connection state such as SSL parameters or proxies:

agent = Mechanize.new do |a|
  a.proxy_host = 'proxy.example'
  a.proxy_port = 8080
end

If you need segregated SSL connections give each agent a unique name. Otherwise the connections will be shared. This is particularly important if you are using certifcates.

agent_1 = Mechanize.new 'conn1'
agent_2 = Mechanize.new 'conn2'
# File lib/mechanize.rb, line 188
def initialize(connection_name = 'mechanize')
  @agent = Mechanize::HTTP::Agent.new(connection_name)
  @agent.context = self
  @log = nil

  # attr_accessors
  @agent.user_agent = AGENT_ALIASES['Mechanize']
  @watch_for_set    = nil
  @history_added    = nil

  # attr_readers
  @pluggable_parser = PluggableParser.new

  @keep_alive_time  = 0

  # Proxy
  @proxy_addr = nil
  @proxy_port = nil
  @proxy_user = nil
  @proxy_pass = nil

  @html_parser = self.class.html_parser

  @default_encoding = nil
  @force_default_encoding = false

  # defaults
  @agent.max_history = 50

  yield self if block_given?

  @agent.set_proxy @proxy_addr, @proxy_port, @proxy_user, @proxy_pass
end
start() { |instance| ... } click to toggle source

Creates a new Mechanize instance and yields it to the given block.

After the block executes, the instance is cleaned up. This includes closing all open connections.

Mechanize.start do |m|
  m.get("http://example.com")
end
# File lib/mechanize.rb, line 163
def self.start
  instance = new
  yield(instance)
ensure
  instance.shutdown
end

History

↑ top

Public Instance Methods

back() click to toggle source

Equivalent to the browser back button. Returns the previous page visited.

# File lib/mechanize.rb, line 229
def back
  @agent.history.pop
end
current_page() click to toggle source

Returns the latest page loaded by Mechanize

# File lib/mechanize.rb, line 236
def current_page
  @agent.current_page
end
Also aliased as: page
history() click to toggle source

The history of this mechanize run

# File lib/mechanize.rb, line 245
def history
  @agent.history
end
max_history() click to toggle source

Maximum number of items allowed in the history. The default setting is 50 pages. Note that the size of the history multiplied by the maximum response body size

# File lib/mechanize.rb, line 254
def max_history
  @agent.history.max_size
end
max_history=(length) click to toggle source

Sets the maximum number of items allowed in the history to length.

Setting the maximum history length to nil will make the history size unlimited. Take care when doing this, mechanize stores response bodies in memory for pages and in the temporary files directory for other responses. For a long-running mechanize program this can be quite large.

See also the discussion under max_file_buffer=

# File lib/mechanize.rb, line 268
def max_history= length
  @agent.history.max_size = length
end
page()
Alias for: current_page
visited?(url) click to toggle source

Returns a visited page for the url passed in, otherwise nil

# File lib/mechanize.rb, line 275
def visited? url
  url = url.href if url.respond_to? :href

  @agent.visited_page url
end
Also aliased as: visited_page
visited_page(url)

Returns whether or not a url has been visited

Alias for: visited?

Hooks

↑ top

Attributes

history_added[RW]

Callback which is invoked with the page that was added to history.

Public Instance Methods

content_encoding_hooks() click to toggle source

A list of hooks to call before reading response header 'content-encoding'.

The hook is called with the agent making the request, the URI of the request, the response an IO containing the response body.

# File lib/mechanize.rb, line 296
def content_encoding_hooks
  @agent.content_encoding_hooks
end
post_connect_hooks() click to toggle source

A list of hooks to call after retrieving a response. Hooks are called with the agent, the URI, the response, and the response body.

# File lib/mechanize.rb, line 309
def post_connect_hooks
  @agent.post_connect_hooks
end
pre_connect_hooks() click to toggle source

A list of hooks to call before retrieving a response. Hooks are called with the agent, the URI, the response, and the response body.

# File lib/mechanize.rb, line 317
def pre_connect_hooks
  @agent.pre_connect_hooks
end

Requests

↑ top

Public Instance Methods

click(link) click to toggle source

If the parameter is a string, finds the button or link with the value of the string on the current page and clicks it. Otherwise, clicks the Mechanize::Page::Link object passed in. Returns the page fetched.

# File lib/mechanize.rb, line 330
def click link
  case link
  when Page::Link then
    referer = link.page || current_page()
    if @agent.robots
      if (referer.is_a?(Page) and referer.parser.nofollow?) or
         link.rel?('nofollow') then
        raise RobotsDisallowedError.new(link.href)
      end
    end
    if link.noreferrer?
      href = @agent.resolve(link.href, link.page || current_page)
      referer = Page.new
    else
      href = link.href
    end
    get href, [], referer
  when String, Regexp then
    if real_link = page.link_with(:text => link)
      click real_link
    else
      button = nil
      # Note that this will not work if we have since navigated to a different page.
      # Should rather make each button aware of its parent form.
      form = page.forms.find do |f|
        button = f.button_with(:value => link)
        button.is_a? Form::Submit
      end
      submit form, button if form
    end
  when Form::Submit, Form::ImageButton then
    # Note that this will not work if we have since navigated to a different page.
    # Should rather make each button aware of its parent form.
    form = page.forms.find do |f|
      f.buttons.include?(link)
    end
    submit form, link if form
  else
    referer = current_page()
    href = link.respond_to?(:href) ? link.href :
      (link['href'] || link['src'])
    get href, [], referer
  end
end
delete(uri, query_params = {}, headers = {}) click to toggle source

DELETE uri with query_params, and setting headers:

query_params is formatted into a query string using Mechanize::Util.build_query_string, which see.

delete('http://example/', {'q' => 'foo'}, {})
# File lib/mechanize.rb, line 424
def delete(uri, query_params = {}, headers = {})
  page = @agent.fetch(uri, :delete, headers, query_params)
  add_to_history(page)
  page
end
download(uri, io_or_filename, parameters = [], referer = nil, headers = {}) click to toggle source

GETs uri and writes it to io_or_filename without recording the request in the history. If io_or_filename does not respond to write it will be used as a file name. parameters, referer and headers are used as in get.

By default, if the Content-type of the response matches a Mechanize::File or Mechanize::Page parser, the response body will be loaded into memory before being saved. See pluggable_parser for details on changing this default.

For alternate ways of downloading files see Mechanize::FileSaver and Mechanize::DirectorySaver.

# File lib/mechanize.rb, line 389
def download uri, io_or_filename, parameters = [], referer = nil, headers = {}
  page = transact do
    get uri, parameters, referer, headers
  end

  io = if io_or_filename.respond_to? :write then
         io_or_filename
       else
         open io_or_filename, 'wb'
       end

  case page
  when Mechanize::File then
    io.write page.body
  else
    body_io = page.body_io

    until body_io.eof? do
      io.write body_io.read 16384
    end
  end

  page
ensure
  io.close if io and not io_or_filename.respond_to? :write
end
get(uri, parameters = [], referer = nil, headers = {}) { |page| ... } click to toggle source

GET the uri with the given request parameters, referer and headers.

The referer may be a URI or a page.

parameters is formatted into a query string using Mechanize::Util.build_query_string, which see.

# File lib/mechanize.rb, line 439
def get(uri, parameters = [], referer = nil, headers = {})
  method = :get

  referer ||=
    if uri.to_s =~ %r{\Ahttps?://}
      Page.new
    else
      current_page || Page.new
    end

  # FIXME: Huge hack so that using a URI as a referer works.  I need to
  # refactor everything to pass around URIs but still support
  # Mechanize::Page#base
  unless Mechanize::Parser === referer then
    referer = if referer.is_a?(String) then
                Page.new URI(referer)
              else
                Page.new referer
              end
  end

  # fetch the page
  headers ||= {}
  page = @agent.fetch uri, method, headers, parameters, referer
  add_to_history(page)
  yield page if block_given?
  page
end
get_file(url) click to toggle source

GET url and return only its contents

# File lib/mechanize.rb, line 471
def get_file(url)
  get(url).body
end
head(uri, query_params = {}, headers = {}) { |page| ... } click to toggle source

HEAD uri with query_params and headers:

query_params is formatted into a query string using Mechanize::Util.build_query_string, which see.

head('http://example/', {'q' => 'foo'}, {})
# File lib/mechanize.rb, line 483
def head(uri, query_params = {}, headers = {})
  page = @agent.fetch uri, :head, headers, query_params

  yield page if block_given?

  page
end
post(uri, query = {}, headers = {}) click to toggle source

POST to the given uri with the given query.

query is processed using Mechanize::Util.each_parameter (which see), and then encoded into an entity body. If any IO/FileUpload object is specified as a field value the “enctype” will be multipart/form-data, or application/x-www-form-urlencoded otherwise.

Examples:

agent.post 'http://example.com/', "foo" => "bar"

agent.post 'http://example.com/', [%w[foo bar]]

agent.post('http://example.com/', "<message>hello</message>",
           'Content-Type' => 'application/xml')
# File lib/mechanize.rb, line 508
def post(uri, query = {}, headers = {})
  return request_with_entity(:post, uri, query, headers) if String === query

  node = {}
  # Create a fake form
  class << node
    def search(*args); []; end
  end
  node['method'] = 'POST'
  node['enctype'] = 'application/x-www-form-urlencoded'

  form = Form.new(node)

  Mechanize::Util.each_parameter(query) { |k, v|
    if v.is_a?(IO)
      form.enctype = 'multipart/form-data'
      ul = Form::FileUpload.new({'name' => k.to_s},::File.basename(v.path))
      ul.file_data = v.read
      form.file_uploads << ul
    elsif v.is_a?(Form::FileUpload)
      form.enctype = 'multipart/form-data'
      form.file_uploads << v
    else
      form.fields << Form::Field.new({'name' => k.to_s},v)
    end
  }
  post_form(uri, form, headers)
end
put(uri, entity, headers = {}) click to toggle source

PUT to uri with entity, and setting headers:

put('http://example/', 'new content', {'Content-Type' => 'text/plain'})
# File lib/mechanize.rb, line 542
def put(uri, entity, headers = {})
  request_with_entity(:put, uri, entity, headers)
end
request_with_entity(verb, uri, entity, headers = {}) click to toggle source

Makes an HTTP request to url using HTTP method verb. entity is used as the request body, if allowed.

# File lib/mechanize.rb, line 550
def request_with_entity(verb, uri, entity, headers = {})
  cur_page = current_page || Page.new

  log.debug("query: #{ entity.inspect }") if log

  headers = {
    'Content-Type' => 'application/octet-stream',
    'Content-Length' => entity.size.to_s,
  }.update headers

  page = @agent.fetch uri, verb, headers, [entity], cur_page
  add_to_history(page)
  page
end
submit(form, button = nil, headers = {}) click to toggle source

Submits form with an optional button.

Without a button:

page = agent.get('http://example.com')
agent.submit(page.forms.first)

With a button:

agent.submit(page.forms.first, page.forms.first.buttons.first)
# File lib/mechanize.rb, line 577
def submit(form, button = nil, headers = {})
  form.add_button_to_query(button) if button

  case form.method.upcase
  when 'POST'
    post_form(form.action, form, headers)
  when 'GET'
    get(form.action.gsub(/\?[^\?]*$/, ''),
        form.build_query,
        form.page,
        headers)
  else
    raise ArgumentError, "unsupported method: #{form.method.upcase}"
  end
end
transact() { |self| ... } click to toggle source

Runs given block, then resets the page history as it was before. self is given as a parameter to the block. Returns the value of the block.

# File lib/mechanize.rb, line 597
def transact
  history_backup = @agent.history.dup
  begin
    yield self
  ensure
    @agent.history = history_backup
  end
end

SSL

↑ top

Public Instance Methods

ca_file() click to toggle source

Path to an OpenSSL server certificate file

# File lib/mechanize.rb, line 1081
def ca_file
  @agent.ca_file
end
ca_file=(ca_file) click to toggle source

Sets the certificate file used for SSL connections

# File lib/mechanize.rb, line 1088
def ca_file= ca_file
  @agent.ca_file = ca_file
end
cert() click to toggle source

An OpenSSL client certificate or the path to a certificate file.

# File lib/mechanize.rb, line 1095
def cert
  @agent.certificate
end
cert=(cert) click to toggle source

Sets the OpenSSL client certificate cert to the given path or certificate instance

# File lib/mechanize.rb, line 1103
def cert= cert
  @agent.certificate = cert
end
cert_store() click to toggle source

An OpenSSL certificate store for verifying server certificates. This defaults to the default certificate store for your system.

If your system does not ship with a default set of certificates you can retrieve a copy of the set from Mozilla here: curl.haxx.se/docs/caextract.html

(Note that this set does not have an HTTPS download option so you may wish to use the firefox-db2pem.sh script to extract the certificates from a local install to avoid man-in-the-middle attacks.)

After downloading or generating a cacert.pem from the above link you can create a certificate store from the pem file like this:

cert_store = OpenSSL::X509::Store.new
cert_store.add_file 'cacert.pem'

And have mechanize use it with:

agent.cert_store = cert_store
# File lib/mechanize.rb, line 1129
def cert_store
  @agent.cert_store
end
cert_store=(cert_store) click to toggle source

Sets the OpenSSL certificate store to store.

See also cert_store

# File lib/mechanize.rb, line 1138
def cert_store= cert_store
  @agent.cert_store = cert_store
end
key() click to toggle source

An OpenSSL private key or the path to a private key

# File lib/mechanize.rb, line 1154
def key
  @agent.private_key
end
key=(key) click to toggle source

Sets the OpenSSL client key to the given path or key instance. If a path is given, the path must contain an RSA key file.

# File lib/mechanize.rb, line 1162
def key= key
  @agent.private_key = key
end
pass() click to toggle source

OpenSSL client key password

# File lib/mechanize.rb, line 1169
def pass
  @agent.pass
end
pass=(pass) click to toggle source

Sets the client key password to pass

# File lib/mechanize.rb, line 1176
def pass= pass
  @agent.pass = pass
end
ssl_version() click to toggle source

SSL version to use.

# File lib/mechanize.rb, line 1183
def ssl_version
  @agent.ssl_version
end
ssl_version=(ssl_version) click to toggle source

Sets the SSL version to use to version without client/server negotiation.

# File lib/mechanize.rb, line 1191
def ssl_version= ssl_version
  @agent.ssl_version = ssl_version
end
verify_callback() click to toggle source

A callback for additional certificate verification. See OpenSSL::SSL::SSLContext#verify_callback

The callback can be used for debugging or to ignore errors by always returning true. Specifying nil uses the default method that was valid when the SSLContext was created

# File lib/mechanize.rb, line 1203
def verify_callback
  @agent.verify_callback
end
verify_callback=(verify_callback) click to toggle source

Sets the OpenSSL certificate verification callback

# File lib/mechanize.rb, line 1210
def verify_callback= verify_callback
  @agent.verify_callback = verify_callback
end
verify_mode() click to toggle source

the OpenSSL server certificate verification method. The default is OpenSSL::SSL::VERIFY_PEER and certificate verification uses the default system certificates. See also #cert_store

# File lib/mechanize.rb, line 1219
def verify_mode
  @agent.verify_mode
end
verify_mode=(verify_mode) click to toggle source

Sets the OpenSSL server certificate verification method.

# File lib/mechanize.rb, line 1226
def verify_mode= verify_mode
  @agent.verify_mode = verify_mode
end

Settings

↑ top

Attributes

html_parser[RW]

Default HTML parser for all mechanize instances

Mechanize.html_parser = Nokogiri::XML
log[RW]

Default logger for all mechanize instances

Mechanize.log = Logger.new $stderr
default_encoding[RW]

A default encoding name used when parsing HTML parsing. When set it is used after any other encoding. The default is nil.

force_default_encoding[RW]

Overrides the encodings given by the HTTP server and the HTML page with the #default_encoding when set to true.

html_parser[RW]

The HTML parser to be used when parsing documents

keep_alive_time[RW]

HTTP/1.0 keep-alive time. This is no longer supported by mechanize as it now uses net-http-persistent which only supports HTTP/1.1 persistent connections

pluggable_parser[R]

The pluggable parser maps a response Content-Type to a parser class. The registered Content-Type may be either a full content type like 'image/png' or a media type 'text'. See Mechanize::PluggableParser for further details.

Example:

agent.pluggable_parser['application/octet-stream'] = Mechanize::Download
proxy_addr[R]

The HTTP proxy address

proxy_pass[R]

The HTTP proxy password

proxy_port[R]

The HTTP proxy port

proxy_user[R]

The HTTP proxy username

watch_for_set[RW]

The value of #watch_for_set is passed to pluggable parsers for retrieved content

Public Instance Methods

add_auth(uri, user, password, realm = nil, domain = nil) click to toggle source

Adds credentials user, pass for uri. If realm is set the credentials are used only for that realm. If realm is not set the credentials become the default for any realm on that URI.

domain and realm are exclusive as NTLM does not follow RFC 2617. If domain is given it is only used for NTLM authentication.

# File lib/mechanize.rb, line 721
def add_auth uri, user, password, realm = nil, domain = nil
  @agent.add_auth uri, user, password, realm, domain
end
auth(user, password, domain = nil) click to toggle source

NOTE: These credentials will be used as a default for any challenge exposing your password to disclosure to malicious servers. Use of this method will warn. This method is deprecated and will be removed in mechanize 3.

Sets the user and password as the default credentials to be used for HTTP authentication for any server. The domain is used for NTLM authentication.

# File lib/mechanize.rb, line 698
  def auth user, password, domain = nil
    caller.first =~ /(.*?):(\d+).*?$/

    warn <<-WARNING
At #{$1} line #{$2}

Use of #auth and #basic_auth are deprecated due to a security vulnerability.

    WARNING

    @agent.add_default_auth user, password, domain
  end
Also aliased as: basic_auth
basic_auth(user, password, domain = nil)
Alias for: auth
conditional_requests() click to toggle source

Are If-Modified-Since conditional requests enabled?

# File lib/mechanize.rb, line 728
def conditional_requests
  @agent.conditional_requests
end
conditional_requests=(enabled) click to toggle source

Disables If-Modified-Since conditional requests (enabled by default)

# File lib/mechanize.rb, line 735
def conditional_requests= enabled
  @agent.conditional_requests = enabled
end
cookies() click to toggle source

Returns a list of cookies stored in the cookie jar.

# File lib/mechanize.rb, line 756
def cookies
  @agent.cookie_jar.to_a
end
follow_meta_refresh() click to toggle source

Follow HTML meta refresh and HTTP Refresh headers. If set to :anywhere meta refresh tags outside of the head element will be followed.

# File lib/mechanize.rb, line 764
def follow_meta_refresh
  @agent.follow_meta_refresh
end
follow_meta_refresh=(follow) click to toggle source

Controls following of HTML meta refresh and HTTP Refresh headers in responses.

# File lib/mechanize.rb, line 772
def follow_meta_refresh= follow
  @agent.follow_meta_refresh = follow
end
follow_meta_refresh_self() click to toggle source

Follow an HTML meta refresh and HTTP Refresh headers that have no “url=” in the content attribute.

Defaults to false to prevent infinite refresh loops.

# File lib/mechanize.rb, line 782
def follow_meta_refresh_self
  @agent.follow_meta_refresh_self
end
follow_meta_refresh_self=(follow) click to toggle source

Alters the following of HTML meta refresh and HTTP Refresh headers that point to the same page.

# File lib/mechanize.rb, line 790
def follow_meta_refresh_self= follow
  @agent.follow_meta_refresh_self = follow
end
follow_redirect=(follow)
Alias for: redirect_ok=
follow_redirect?()
Alias for: redirect_ok
gzip_enabled() click to toggle source

Is gzip compression of responses enabled?

# File lib/mechanize.rb, line 797
def gzip_enabled
  @agent.gzip_enabled
end
gzip_enabled=(enabled) click to toggle source

Disables HTTP/1.1 gzip compression (enabled by default)

# File lib/mechanize.rb, line 804
def gzip_enabled=enabled
  @agent.gzip_enabled = enabled
end
idle_timeout() click to toggle source

Connections that have not been used in this many seconds will be reset.

# File lib/mechanize.rb, line 811
def idle_timeout
  @agent.idle_timeout
end
idle_timeout=(idle_timeout) click to toggle source

Sets the idle timeout to idle_timeout. The default timeout is 5 seconds. If you experience “too many connection resets”, reducing this value may help.

# File lib/mechanize.rb, line 819
def idle_timeout= idle_timeout
  @agent.idle_timeout = idle_timeout
end
ignore_bad_chunking() click to toggle source

When set to true mechanize will ignore an EOF during chunked transfer encoding so long as at least one byte was received. Be careful when enabling this as it may cause data loss.

Net::HTTP does not inform mechanize of where in the chunked stream the EOF occurred. Usually it is after the last-chunk but before the terminating CRLF (invalid termination) but it may occur earlier. In the second case your response body may be incomplete.

# File lib/mechanize.rb, line 833
def ignore_bad_chunking
  @agent.ignore_bad_chunking
end
ignore_bad_chunking=(ignore_bad_chunking) click to toggle source

When set to true mechanize will ignore an EOF during chunked transfer encoding. See #ignore_bad_chunking for further details

# File lib/mechanize.rb, line 841
def ignore_bad_chunking= ignore_bad_chunking
  @agent.ignore_bad_chunking = ignore_bad_chunking
end
keep_alive() click to toggle source

Are HTTP/1.1 keep-alive connections enabled?

# File lib/mechanize.rb, line 848
def keep_alive
  @agent.keep_alive
end
keep_alive=(enable) click to toggle source

Disable HTTP/1.1 keep-alive connections if enable is set to false. If you are experiencing “too many connection resets” errors setting this to false will eliminate them.

You should first investigate reducing idle_timeout.

# File lib/mechanize.rb, line 859
def keep_alive= enable
  @agent.keep_alive = enable
end
log() click to toggle source

The current logger. If no logger has been set #log is used.

# File lib/mechanize.rb, line 866
def log
  @log || Mechanize.log
end
log=(logger) click to toggle source

Sets the logger used by this instance of mechanize

# File lib/mechanize.rb, line 873
def log= logger
  @log = logger
end
max_file_buffer() click to toggle source

Responses larger than this will be written to a Tempfile instead of stored in memory. The default is 100,000 bytes.

A value of nil disables creation of Tempfiles.

# File lib/mechanize.rb, line 883
def max_file_buffer
  @agent.max_file_buffer
end
max_file_buffer=(bytes) click to toggle source

Sets the maximum size of a response body that will be stored in memory to bytes. A value of nil causes all response bodies to be stored in memory.

Note that for Mechanize::Download subclasses, the maximum buffer size multiplied by the number of pages stored in history (controlled by max_history) is an approximate upper limit on the amount of memory Mechanize will use. By default, Mechanize can use up to ~5MB to store response bodies for non-File and non-Page (HTML) responses.

See also the discussion under max_history=

# File lib/mechanize.rb, line 900
def max_file_buffer= bytes
  @agent.max_file_buffer = bytes
end
open_timeout() click to toggle source

Length of time to wait until a connection is opened in seconds

# File lib/mechanize.rb, line 907
def open_timeout
  @agent.open_timeout
end
open_timeout=(open_timeout) click to toggle source

Sets the connection open timeout to open_timeout

# File lib/mechanize.rb, line 914
def open_timeout= open_timeout
  @agent.open_timeout = open_timeout
end
read_timeout() click to toggle source

Length of time to wait for data from the server

# File lib/mechanize.rb, line 921
def read_timeout
  @agent.read_timeout
end
read_timeout=(read_timeout) click to toggle source

Sets the timeout for each chunk of data read from the server to read_timeout. A single request may read many chunks of data.

# File lib/mechanize.rb, line 929
def read_timeout= read_timeout
  @agent.read_timeout = read_timeout
end
redirect_ok() click to toggle source

Controls how mechanize deals with redirects. The following values are allowed:

:all, true

All 3xx redirects are followed (default)

:permanent

Only 301 Moved Permanantly redirects are followed

false

No redirects are followed

# File lib/mechanize.rb, line 941
def redirect_ok
  @agent.redirect_ok
end
Also aliased as: follow_redirect?
redirect_ok=(follow) click to toggle source

Sets the mechanize redirect handling policy. See #redirect_ok for allowed values

# File lib/mechanize.rb, line 951
def redirect_ok= follow
  @agent.redirect_ok = follow
end
Also aliased as: follow_redirect=
redirection_limit() click to toggle source

Maximum number of redirections to follow

# File lib/mechanize.rb, line 960
def redirection_limit
  @agent.redirection_limit
end
redirection_limit=(limit) click to toggle source

Sets the maximum number of redirections to follow to limit

# File lib/mechanize.rb, line 967
def redirection_limit= limit
  @agent.redirection_limit = limit
end
request_headers() click to toggle source

A hash of custom request headers that will be sent on every request

# File lib/mechanize.rb, line 980
def request_headers
  @agent.request_headers
end
request_headers=(request_headers) click to toggle source

Replaces the custom request headers that will be sent on every request with request_headers

# File lib/mechanize.rb, line 988
def request_headers= request_headers
  @agent.request_headers = request_headers
end
resolve(link) click to toggle source

Resolve the full path of a link / uri

# File lib/mechanize.rb, line 973
def resolve link
  @agent.resolve link
end
retry_change_requests() click to toggle source

Retry POST and other non-idempotent requests. See RFC 2616 9.1.2.

# File lib/mechanize.rb, line 995
def retry_change_requests
  @agent.retry_change_requests
end
retry_change_requests=(retry_change_requests) click to toggle source

When setting retry_change_requests to true you are stating that, for all the URLs you access with mechanize, making POST and other non-idempotent requests is safe and will not cause data duplication or other harmful results.

If you are experiencing “too many connection resets” errors you should instead investigate reducing the #idle_timeout or disabling #keep_alive connections.

# File lib/mechanize.rb, line 1009
def retry_change_requests= retry_change_requests
  @agent.retry_change_requests = retry_change_requests
end
robots() click to toggle source

Will /robots.txt files be obeyed?

# File lib/mechanize.rb, line 1016
def robots
  @agent.robots
end
robots=(enabled) click to toggle source

When enabled mechanize will retrieve and obey robots.txt files

# File lib/mechanize.rb, line 1024
def robots= enabled
  @agent.robots = enabled
end
scheme_handlers() click to toggle source

The handlers for HTTP and other URI protocols.

# File lib/mechanize.rb, line 1031
def scheme_handlers
  @agent.scheme_handlers
end
scheme_handlers=(scheme_handlers) click to toggle source

Replaces the URI scheme handler table with scheme_handlers

# File lib/mechanize.rb, line 1038
def scheme_handlers= scheme_handlers
  @agent.scheme_handlers = scheme_handlers
end
user_agent() click to toggle source

The identification string for the client initiating a web request

# File lib/mechanize.rb, line 1045
def user_agent
  @agent.user_agent
end
user_agent=(user_agent) click to toggle source

Sets the User-Agent used by mechanize to user_agent. See also user_agent_alias

# File lib/mechanize.rb, line 1053
def user_agent= user_agent
  @agent.user_agent = user_agent
end
user_agent_alias=(name) click to toggle source

Set the user agent for the Mechanize object based on the given name.

See also AGENT_ALIASES

# File lib/mechanize.rb, line 1062
def user_agent_alias= name
  self.user_agent = AGENT_ALIASES[name] ||
    raise(ArgumentError, "unknown agent alias #{name.inspect}")
end

Utilities

↑ top

Constants

VERSION

Public Instance Methods

parse(uri, response, body) click to toggle source

Parses the body of the response from uri using the pluggable parser that matches its content type

# File lib/mechanize.rb, line 1238
def parse uri, response, body
  content_type = nil

  unless response['Content-Type'].nil?
    data, = response['Content-Type'].split ';', 2
    content_type, = data.downcase.split ',', 2 unless data.nil?
  end

  parser_klass = @pluggable_parser.parser content_type

  unless parser_klass <= Mechanize::Download then
    body = case body
           when IO, Tempfile, StringIO then
             body.read
           else
             body
           end
  end

  parser_klass.new uri, response, body, response.code do |parser|
    parser.mech = self if parser.respond_to? :mech=

    parser.watch_for_set = @watch_for_set if
      @watch_for_set and parser.respond_to?(:watch_for_set=)
  end
end
reset() click to toggle source

Clears history and cookies.

# File lib/mechanize.rb, line 1289
def reset
  @agent.reset
end
set_proxy(address, port, user = nil, password = nil) click to toggle source

Sets the proxy address at port with an optional user and password

# File lib/mechanize.rb, line 1277
def set_proxy address, port, user = nil, password = nil
  @proxy_addr = address
  @proxy_port = port
  @proxy_user = user
  @proxy_pass = password

  @agent.set_proxy address, port, user, password
end
shutdown() click to toggle source

Shuts down this session by clearing browsing state and closing all persistent connections.

# File lib/mechanize.rb, line 1297
def shutdown
  reset
  @agent.shutdown
end