ScrapingBytes API - Documentation
Welcome to the official ScrapingBytes documentation. Our goal is to make this process as simple as possible. Web scraping shouldn't be hard and neither should our implementation. To get started you'll need to know a few things:
- Your API Key
- The URL you want to scrape
- Do you need to render Javascript?
- Do you want to use a premium/residential proxy?
Finding Your API Key
You have to generate an API key. It will only be shown once. Please store this in a secure place and keep it private. To generate an API key first head to the dashboard.
An API key will be shown like the following.
Copy this and integrate it into your application. Keep your API key private and secure. How to do this is beyond the scope of this documentation. A brief example from Twilio by using environmental variables can be watched here.
Sending a Request
Now that you have your API key you can send a POST request to our API:
https://scrapingbytes.com/api/v1/scrape
Authorization: Bearer YOUR-API-KEY
You'll need to build a request body, and send it as JSON. At a minimum you'll need to provide the target URL.
{
"url": "https://books.toscrape.com/",
"render_js": true,
"premium_proxy": true
}
If you do not want to use the headless browser or premium proxies set the values to false. Don't know how to do this? Try using the Request Builder on the dashboard.
The API will return the raw HTML from the target URL:
<html>
<head>
...
</head>
<body>
...
</body>
</html>
To know if a request was successful plus the credit costs, check the response headers.
SB-Credit-Cost This header will return a number with the total credits used. SB-Success This header will return a boolean on whether the request was successful or not.In this case the response headers would be:
Key | Value |
---|---|
SB-Credit-Cost |
25 |
SB-Success |
true |
Controls
These are the JSON parameters you can provide to the browser to control it.
Please note if render_js
is disabled we send an HTTP request instead of using a headless browser.
Name type default | Description |
---|---|
url string required |
The URL of the page you want to scrape |
render_js bool true |
Fetch the website with a headless browser and render JavaScript |
premium_proxy bool false |
Use a premium proxy to bypass difficult websites. This uses a pool of residential proxies |
screenshot bool false |
Return a screenshot of the page you want to scrape. This attempts to fetch the full page of the website. This is returned as base64 |
mobile bool false |
Control if a mobile device will be used to send the request. Defaults to desktop |
block_resources bool true |
Block resources such as images |
block_ads bool true |
Block ads on the page you want to scrape |
window_height int 1080 |
Height in pixels of the browser window and viewport used to scrape the page |
window_width int 1920 |
Height in pixels of the browser window and viewport used to scrape the page |
timeout int 130 |
How long to wait before timing out your request. Maximum is 130 seconds |
Want to quickly build the JSON required? Try the request builder on the dashboard.
Credit Costs
Every ScrapingBytes plan includes a certain amount of credits to use per month.
The credits used per request depends on the parameters provided with your API calls. It will cost 1 to 25 credits to make a request with the ScrapingBytes API.
A breakdown of credit costs:
Feature | Credit Cost |
---|---|
Rotating Proxy without JavaScript rendering | 1 |
Rotating Proxy with JavaScript rendering | 5 |
Premium Proxy without JavaScript rendering | 10 |
Premium Proxy with JavaScript rendering | 25 |
Response Status Codes
Every response from the ScrapingBytes API returns a HTTP code with a specific/general reason. You can find all status codes and their reasons below:
Code | Charged | Meaning | Solution |
---|---|---|---|
200 | Yes | Successful request | |
204 | No | No content | There was a problem fetching the content. Please try again. |
400 | No | Bad request | Invalid parameters provided. Check the response message for further details. |
402 | No | Not enough credits | Please upgrade your plan, or contact support/sales for assistance. |
429 | No | Too many concurrent connections | Please upgrade your plan, or contact support/sales for assistance. |
404 | Yes | Invalid URL | You provided an invalid URL or we couldn't find the requested URL. |
500 | No | Error | Please try again, an unknown error has occurred. |