How To Use ScrapingBytes

Last updated 1 month ago

how to use scrapingbytes banner

New to ScrapingBytes? Well, we’re a service that enables you to web scrape at scale with a headless browser, requests, premium proxies, or datacenter based proxies. All you need to do is send a post request to our API. We fetch the website, render the JavaScript when using the headless browser and return the result as HTML. From here you’ll need to parse the results and data you need yourself. We also help you avoid detection on websites with our technology. Though you will need to test the settings we offer on different websites to bypass detection. Anyways, let’s dive into how to use ScrapingBytes.

To get started you’ll need an account. You can sign up for free and get 1,000 credits to start! Once signed up you’ll need to generate an API key. On the dashboard you’ll press the button Generate in the card asking if you need an API key.

blog/uploads/PFR0PV9oEjTrgabS1R3WfOZi1pGBllQlCMn9FFdd.png

This will show the API key as the result. You’ll want to write this down in a secure place as it will only be shown once.

blog/uploads/Xepct0VsXVzSdo4gGPi6BHnxXLfNAAQZsRnteymV.png

Now you’re ready to start using ScrapingBytes. In this guide we assume you’re going to be using Python but you’re able to do this through CURL or any programming language that supports HTTP requests.

Let’s start off by creating a new Python project with a virtual environment with venv or Poetry or if you’re using PyCharm then create it through there. Finally, create your file with whatever name you’re like and then add the following code:

SB_API_KEY = 'YOUR-API-KEY-HERE'


def scrape():
    print('Starting to scrape...')

if __name__ == '__main__':
    scrape()

We have our entry point for the application which simply calls scrape. Plus it has a constant for the ScrapingBytes API key. You’ll want to replace this with your API key. Typically, you want to securely store the API key as an environmental variable. However, for this example let’s keep it short and sweet.

Sending The Request

Next, we’re going to want to install the requests library. We can do this by running the following command with our virtual environment: pip install requests.

This will install the most recent version of requests. This simplifies things when calling the ScrapingBytes API. With the requests library let’s call the scrape endpoint which returns HTML from the target website. We want to use the headless browser, and a premium proxy. Our code would look like the following:

import requests


SB_API_KEY = 'YOUR-API-KEY-HERE'


def scrape():
    print('Starting to scrape...')

    # Build the payload
    data = {
        "url": "https://books.toscrape.com",
        "browser": True,
        "premium_proxy": True
    }

    # Auth required for ScrapingBytes
    headers = {
        'Authorization': f'Bearer {SB_API_KEY}',
        'Content-Type': 'application/json',
        'Accept': 'application/json'
    }

    # Send the request to ScrapingBytes
    resp = requests.post('https://scrapingbytes.com/api/v1/scrape', json=data, headers=headers)

    # Print the HTML returned from the scrape endpoint
    print(resp.text)

if __name__ == '__main__':
    scrape()

Let’s explain this a bit. For ScrapingBytes, we take a JSON payload that will control a headless browser or request instance. In this case we’re building the data dictionary which will always at least require the target URL. In this case we are targeting books to scrape and using a headless browser with a premium proxy.

In order to complete this request you are required to provide your API key which is done as a bearer token. This API key is attached to your account. If you run this script you should see it working successfully. If not, make sure you have the correct API key provided. As well, make sure you have the content-type and accept headers. Without these you will be redirect to the login page instead of being returned JSON.

Parsing HTML

You’ve got things working successfully! Now you’ll need to write your own code to parse the HTML. This can be done with a library such as BeautifulSoup. Want to see a guide on this? Let us know!

The Dashboard

When you send these requests you should see the connection, request, and transaction information on the dashboard. It may look like the following:

blog/uploads/LaarCKZJa9yg2uCpaRO8mvGce1zH0vedX0tGLFkA.png

This shows information about the credits used, the target endpoint, the domain, target URL, and the duration of the request.

Wrapping Up

That’s it! You’ve successfully created a basic script to interact with ScrapingBytes. You’re able to easily extend this script and prevent yourself from being blocked when using our software. You can find a list of all available documentation on our website located here. This has a break down of all credit costs, response status codes, and how to get started.

Implementation In Different Languages

Go

package main

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io"
	"log"
	"net/http"
)

const SbApiKey = "YOUR-API-KEY-HERE"

// ScrapeRequest for request payload
type ScrapeRequest struct {
	URL          string `json:"url"`
	Browser      bool   `json:"browser"`
	PremiumProxy bool   `json:"premium_proxy"`
}

func scrape() {
	fmt.Println("Starting to scrape...")

	// Build the payload
	data := ScrapeRequest{
		URL:          "https://books.toscrape.com",
		Browser:      true,
		PremiumProxy: true,
	}

	// Convert data to JSON
	payload, err := json.Marshal(data)
	if err != nil {
		log.Fatalf("Error marshalling JSON: %v", err)
	}

	// Create the request
	req, err := http.NewRequest("POST", "https://scrapingbytes.com/api/v1/scrape", bytes.NewBuffer(payload))
	if err != nil {
		log.Fatalf("Error creating request: %v", err)
	}

	// Set the Authorization header
	req.Header.Set("Authorization", "Bearer "+SbApiKey)
	req.Header.Set("Content-Type", "application/json")
	req.Header.Set("Accept", "application/json")

	// Send the request
	client := &http.Client{}
	resp, err := client.Do(req)
	if err != nil {
		log.Fatalf("Error sending request: %v", err)
	}
	defer resp.Body.Close()

	fmt.Printf("Status code: %v", resp.StatusCode)

	// Read and print the response
	body, err := io.ReadAll(resp.Body)
	if err != nil {
		log.Fatalf("Error reading response body: %v", err)
	}

	// Print the response body
	fmt.Println(string(body))
}

func main() {
	scrape()
}