Documentation - Scrape API

Choose Your Language


ScrapingBytes - Controls

These are the JSON parameters you can provide to the browser to control it with the Scrape API. Please note if render_js is disabled we send an HTTP request instead of using a headless browser.

Name type default Description
url string required The URL of the page you want to scrape
render_js bool true Fetch the website with a headless browser and render JavaScript
premium_proxy bool false Use a premium proxy to bypass difficult websites. This uses a pool of residential proxies
screenshot bool false Return a screenshot of the page you want to scrape. This attempts to fetch the full page of the website. This is returned as base64
mobile bool false Control if a mobile device will be used to send the request. Defaults to desktop
block_resources bool true Block resources such as images
block_ads bool true Block ads on the page you want to scrape
window_height int 1080 Height in pixels of the browser window and viewport used to scrape the page
window_width int 1920 Height in pixels of the browser window and viewport used to scrape the page
timeout int 130 How long to wait before timing out your request. Maximum is 130 seconds
instructions array null Instructions to send to control the browser. Perform clicks, waiting, scrolling, and more.

Want to quickly build the JSON required? Try the request builder on the dashboard.


URL

This parameter is the full URL including the protocol (http/https) that you want to scrape. You must URL encode your URL before sending it to us. If you don't see an example for your language here, please refer to it's documentation.

import requests

requests.utils.quote("https://playground.scrapingbytes.com/")
package main

import (
    "net/url"
)

func main() {
    encodedUrl := url.QueryEscape("https://playground.scrapingbytes.com/")
}

Render Javascript

By default, ScrapingBytes will enable the headless browser and fetch a website with it. This is the default behavior and costs 5 credits per request.

To fetch a website without the headless browser set render_js to false. Please note, this will fall back to using HTTP requests.


Premium Proxy

By default, this is false. When turned on we try fetching the website from our pool of residential proxies. If disabled we use a datacenter proxy from our pool of proxies. Some websites that are harder to scrape will require a premium proxy. If you're having issues scraping a target website try turning this on.

Examples of websites where this might be required is search engines, social networks, and ecommerce websites.

Each request with this parameter will cost 25 API credits with render_js enabled. If used with HTTP requests it will cost 10 credits.


Instructions

By default, this is null. When instructions are sent we perform the actions you send in the browser. This must be an array containing objects of instructions. All instructions must be completed before the timeout duration.

Instructions are completed from the top of the list to the bottom.

All elements can be found by class, id, or xpath.

Below is an example of all available instructions.

{
    "url": "https://playground.scrapingbytes.com/",
    "render_js": true,
    "premium_proxy": true,
    "instructions": [
        {"wait_for": ".listing-clickbox"}, // Will wait for a max of 30 seconds
        {"click": "#load_more"},
        {"wait": 5}, // Time in seconds. Can be a float
        {"scroll_y": 500},
        {"scroll_x": 800},
        {"fill": [".selector", "My custom value"]},
    ]
}