Documentation - Scrape API

Choose Your Language

ScrapingBytes - Controls

These are the JSON parameters you can provide to the browser to control it with the Scrape API. Please note if render_js is disabled we send an HTTP request instead of using a headless browser.

Name type default	Description
`url` string required	The URL of the page you want to scrape
`render_js` bool true	Fetch the website with a headless browser and render JavaScript
`premium_proxy` bool false	Use a premium proxy to bypass difficult websites. This uses a pool of residential proxies
`screenshot` bool false	Return a screenshot of the page you want to scrape. This attempts to fetch the full page of the website. This is returned as base64
`mobile` bool false	Control if a mobile device will be used to send the request. Defaults to desktop
`block_resources` bool true	Block resources such as images
`block_ads` bool true	Block ads on the page you want to scrape
`window_height` int 1080	Height in pixels of the browser window and viewport used to scrape the page
`window_width` int 1920	Height in pixels of the browser window and viewport used to scrape the page
`timeout` int 130	How long to wait before timing out your request. Maximum is 130 seconds
`instructions` array null	Instructions to send to control the browser. Perform clicks, waiting, scrolling, and more.

Want to quickly build the JSON required? Try the request builder on the dashboard.

URL

This parameter is the full URL including the protocol (http/https) that you want to scrape. You must URL encode your URL before sending it to us. If you don't see an example for your language here, please refer to it's documentation.

import requests

requests.utils.quote("https://playground.scrapingbytes.com/")

package main

import (
    "net/url"
)

func main() {
    encodedUrl := url.QueryEscape("https://playground.scrapingbytes.com/")
}

Render Javascript

By default, ScrapingBytes will enable the headless browser and fetch a website with it. This is the default behavior and costs 5 credits per request.

To fetch a website without the headless browser set render_js to false. Please note, this will fall back to using HTTP requests.

Premium Proxy

By default, this is false. When turned on we try fetching the website from our pool of residential proxies. If disabled we use a datacenter proxy from our pool of proxies. Some websites that are harder to scrape will require a premium proxy. If you're having issues scraping a target website try turning this on.

Examples of websites where this might be required is search engines, social networks, and ecommerce websites.

Each request with this parameter will cost 25 API credits with render_js enabled. If used with HTTP requests it will cost 10 credits.

Instructions

By default, this is null. When instructions are sent we perform the actions you send in the browser. This must be an array containing objects of instructions. All instructions must be completed before the timeout duration.

Instructions are completed from the top of the list to the bottom.

All elements can be found by class, id, or xpath.

Below is an example of all available instructions.

{
    "url": "https://playground.scrapingbytes.com/",
    "render_js": true,
    "premium_proxy": true,
    "instructions": [
        {"wait_for": ".listing-clickbox"}, // Will wait for a max of 30 seconds
        {"click": "#load_more"},
        {"wait": 5}, // Time in seconds. Can be a float
        {"scroll_y": 500},
        {"scroll_x": 800},
        {"fill": [".selector", "My custom value"]},
    ]
}