Scraper extractor types explained

We support the following extractor types:

Text

Select this extractor type if you want to extract the text content from a selector.

For example, if your target is:

<h1>This is the page title</h1>

The extraction will be:

{
  "page_title": "This is the page title"
}

Inner HTML

Select this extractor type if you want to extract the text content but also the HTML markup INSIDE the selected element.

For example, if your target is:

<h1>This is the <span>page title</span></h1>

The extraction will be:

{
  "title_inner_html": "This is the <span>page title</span>"
}

Outer HTML

Select this extractor type if you want to extract all content from the selected element, including the HTML markup.

For example, if your target is:

<h1>This is the <span>page title</span></h1>

The extraction will be:

{
  "title_outer_html": "<h1>This is the <span>page title</span></h1>"
}

Attribute

Select this extractor type if you want to get the content from an HTML attribute.

For example, if your target is the attribute href:

<a href="https://google.com">click here</a>

"https://google.com" will be extracted and assigned to the variable name you have defined.

{
  "link": "https://google.com"
}

You can use any HTML attribute. Some common examples are: href, id, class, src, alt, style, and type.

Collection

This option is useful to extract groups of data together. A collection can have sub-selectors in order to group information into a nested key. It is useful for cards and table rows.

You could extract something like this:

<div>
  <div class="coin">
    <h3 class="name">Botcoin</h3>
    <p class="price">$19</p>
  </div>
  <div class="coin">
    <h3 class="name">Etirum</h3>
    <p class="price">$17</p>
  </div>
</div>

Into this:

{
  "coins": [
    {
      "name": "Botcoin",
      "price": "$19"
    },
    {
      "name": "Etirum",
      "price": "$17"
    }
  ]
}