Documentation - Scrape API
Choose Your Language
ScrapingBytes - Controls
These are the JSON parameters you can provide to the browser to control it with the Scrape API. Please note if render_js
is disabled we send an HTTP request instead of using a headless browser.
Name type default | Description |
---|---|
url string required |
The URL of the page you want to scrape |
render_js bool true |
Fetch the website with a headless browser and render JavaScript |
premium_proxy bool false |
Use a premium proxy to bypass difficult websites. This uses a pool of residential proxies |
screenshot bool false |
Return a screenshot of the page you want to scrape. This attempts to fetch the full page of the website. This is returned as base64 |
mobile bool false |
Control if a mobile device will be used to send the request. Defaults to desktop |
block_resources bool true |
Block resources such as images |
block_ads bool true |
Block ads on the page you want to scrape |
window_height int 1080 |
Height in pixels of the browser window and viewport used to scrape the page |
window_width int 1920 |
Height in pixels of the browser window and viewport used to scrape the page |
timeout int 130 |
How long to wait before timing out your request. Maximum is 130 seconds |
instructions array null |
Instructions to send to control the browser. Perform clicks, waiting, scrolling, and more. |
Want to quickly build the JSON required? Try the request builder on the dashboard.
URL
This parameter is the full URL including the protocol (http/https) that you want to scrape. You must URL encode your URL before sending it to us. If you don't see an example for your language here, please refer to it's documentation.
import requests
requests.utils.quote("https://playground.scrapingbytes.com/")
package main
import (
"net/url"
)
func main() {
encodedUrl := url.QueryEscape("https://playground.scrapingbytes.com/")
}
Render Javascript
By default, ScrapingBytes will enable the headless browser and fetch a website with it. This is the default behavior and costs 5 credits per request.
To fetch a website without the headless browser set render_js
to false
. Please note, this will fall back to using HTTP requests.
Premium Proxy
By default, this is false
. When turned on we try fetching the website from our pool of residential proxies. If disabled we use a datacenter proxy from our pool of proxies.
Some websites that are harder to scrape will require a premium proxy. If you're having issues scraping a target website try turning this on.
Examples of websites where this might be required is search engines, social networks, and ecommerce websites.
Each request with this parameter will cost 25 API credits with render_js
enabled. If used with HTTP requests it will cost 10 credits.
Instructions
By default, this is null
. When instructions are sent we perform the actions you send in the browser. This must be an array containing objects of instructions.
All instructions must be completed before the timeout duration.
Instructions are completed from the top of the list to the bottom.
All elements can be found by class, id, or xpath.
Below is an example of all available instructions.
{
"url": "https://playground.scrapingbytes.com/",
"render_js": true,
"premium_proxy": true,
"instructions": [
{"wait_for": ".listing-clickbox"}, // Will wait for a max of 30 seconds
{"click": "#load_more"},
{"wait": 5}, // Time in seconds. Can be a float
{"scroll_y": 500},
{"scroll_x": 800},
{"fill": [".selector", "My custom value"]},
]
}