Scraper extractor types explained
We support the following extractor types:
Text
Select this extractor type if you want to extract the text content from a selector.
For example, if your target is:
<h1>This is the page title</h1>
The extraction will be:
{
"page_title": "This is the page title"
}
Inner HTML
Select this extractor type if you want to extract the text content but also the HTML markup INSIDE the selected element.
For example, if your target is:
<h1>This is the <span>page title</span></h1>
The extraction will be:
{
"title_inner_html": "This is the <span>page title</span>"
}
Outer HTML
Select this extractor type if you want to extract all content from the selected element, including the HTML markup.
For example, if your target is:
<h1>This is the <span>page title</span></h1>
The extraction will be:
{
"title_outer_html": "<h1>This is the <span>page title</span></h1>"
}
Attribute
Select this extractor type if you want to get the content from an HTML attribute.
For example, if your target is the attribute href:
<a href="https://google.com">click here</a>
"https://google.com" will be extracted and assigned to the variable name you have defined.
{
"link": "https://google.com"
}
You can use any HTML attribute. Some common examples are: href, id, class, src, alt, style, and type.
Collection
This option is useful to extract groups of data together. A collection can have sub-selectors in order to group information into a nested key. It is useful for cards and table rows.
You could extract something like this:
<div>
<div class="coin">
<h3 class="name">Botcoin</h3>
<p class="price">$19</p>
</div>
<div class="coin">
<h3 class="name">Etirum</h3>
<p class="price">$17</p>
</div>
</div>
Into this:
{
"coins": [
{
"name": "Botcoin",
"price": "$19"
},
{
"name": "Etirum",
"price": "$17"
}
]
}