Scraper extractor types explained

We support the following extractor types:


Text

Select this extractor type if you want to extract the text content from a selector.

For example, if your target is:

<h1>This is the page title</h1>

The extraction will be: 

{

"page_title": "This is the page title"

}


Inner HTML

Select this extractor type if you want to extract the text content but also the HTML markup INSIDE the selected element.

For example, if your target is:

<h1>This is the <span>page title</span></h1>

The extraction will be:

{

"title_inner_html": "This is the <span>page title</span>"

}


Outer HTML

Select this extractor type if you want to extract all content from the selected element, including the HTML markup.

For example, if your target is:

<h1>This is the <span>page title</span></h1>

The extraction will be:

{

"title_outer_html": "<h1>This is the <span>page title</span></h1>"

}


Attribute

Select this extractor type if you want to get the content from an HTML attribute.

For example, if your target is the attribute href:

<a href="https://google.com">click here</a>

"https://google.com" will be extracted and assigned to the variable name you have defined.

{

"link": "https://google.com"

}

You can use any HTML attribute. Some common examples are: href, id, class, src, alt, style, and type.


Collection

This option is useful to extract groups of data together. A collection can have sub-selectors in order to group information into a nested key. It is useful for cards and table rows.

You could extract something like this:

<div>

<div class="coin">

<h3 class="name">Botcoin</h3>

<p class="price">$19</p>

</div>

<div class="coin">

<h3 class="name">Etirum</h3>

<p class="price">$17</p>

</div>

</div>

Into this:

{

"coins": [

{

"name": "Botcoin",

"price": "$19"

},

{

"name": "Etirum",

"price": "$17"

}

]

}

Did this answer your question? Thanks for the feedback There was a problem submitting your feedback. Please try again later.

Still need help? Contact Us Contact Us