Set up your Batch API
Step by step guide to set up your Similarweb Batch API
Welcome to Similarweb's Batch API - giving you scaleable access to the world's largest digital measure database!
Get Similarweb data for more than 1,000,000 domains and 5 years of history, tens of metrics - in one API call!
This guide has 2 quick steps to get millions of data points from our API.
Get started checklist:
- 🔑 Get an API Key
- 📈 Choose the data and metrics you need (Batch API Datasets or discovery endpoints)
- 📝 Create a request report with a valid JSON
- 🔗 Connect and integrate to your Data lake (S3, Snowflake, Databricks)
After you set up your API Key, you can start creating your first API call, for up to 1,000,000 domains!
Step-by-step guide:
- Make a POST request with a JSON in the body or attached as a file as multipart/form-data
https://api.similarweb.com/v3/batch/request-report
import requests
url = "https://api.similarweb.com/v3/batch/request-report"
payload={}
files=[
('request',('Batchexample.json',open('/Users/Batchexample.json','rb'),'application/json'))
]
headers = {
'api-key': '{{your_api_key}}'
}
response = requests.request("POST", url, headers=headers, data=payload, files=files)
print(response.text)
In order to calculate the estimated credits the report will cost, you can use the "request-validate" endpoint
Example JSON:
{
"domains":[
"cnn.com"
],
"countries": ["US"],
"metrics":[
"all_traffic_visits"
],
"start_date": "2022-12-01",
"end_date": "2023-01-01",
"granularity": "daily",
"delivery_method": "download_link",
"response_format": "json",
"webhook_url": "foo.com"
}
When requesting a report you must include in the JSON the following parameters.
Make sure to save the report ID you receive after your API request.
Data credits are calculated for each report based on the number of results you are actually receiving:
Formula: Number of domains X Number of metrics X history X cadence (daily/monthly) X Number of countries X Number of results
The request limit per user is 100 pending requests. if you receive a '429' error it means you've exceeded the limit of allowed pending requests. Reduce the frequency of your requests to stay within the limits of your account.
Mandatory Parameters:
Parameters | Description | Acceptable Values |
---|---|---|
domains | Characters in domain names can include letters, numbers, dashes, and hyphens. One request can include up to 1M domains. | amazon.com |
countries | Countries with standard 2-letter ISO encoding when calling all metrics (excluding desktop_top_geo). For worldwide, use "WW". This parameter is case-sensitive and must be inputted in capital letters. When calling desktop_top_geo, you must remove any countries from your JSON file. | WW, US, GB All country codes |
metrics | List of metrics per dataset | all_traffic_visits |
start_date, end_date | For daily granularity, format the start-and-end date like this: YYYY-MM-DD. For monthly granularity, format the start-and-end date like this- YYYY-MM | Daily: 2023-06-30 Monthly: 2023-06 |
granularity | Time series granularity | monthly, weekly, daily |
response_format | Output of the API call | JSON, csv, parquet, orc |
When requesting a report that includes multiple metrics, please create separate requests for metrics that aren't within the same Metric Group. Check the supported metrics datasets to verify which Metric Group each metric belongs to.
Optional Parameters:
Parameter | Description | Acceptable values |
---|---|---|
delivery_method | The default Value is "download_link". When the delivery method is set to “snowflake”, the “response_format” field is not required | download_link, bucket_access, snowflake |
delivery_method_params | Use this when requesting reports to be delivered to aggregated Snowflake tables. Input “table_name”: “your_table_name”. See set-up guide for more details. | table_name |
all_history | Boolean, when set to true, will automatically override the dates to the minimum start date and maximum end date, valid values true or false, default is false. | true/false |
latest | Boolean, when set to true will override the end date with the latest available date, if the start date is not specified it will also override the start date with the same. | true/false |
window_size | String, when set will override the start date with a time relative to the end date. | Should be in the format - {number}{y/m/d}, for example - '12d', '3m', or '2y'. |
limit | Integer, Limits the number of results per entity selected. | above 0, most metrics default is 100 |
Include_subdomains | Boolean, Default is true. | true/false |
webhook_url | Enter the delivery URL you'd like us to ping when the status of your report changes. | URL |
sort | Allows you to sort by a specific metric | specific metrics: "sort": "all_traffic_visits" |
- After you made your request and got your report ID, use the Request Report Status to receive the report status.
Upon completion, you will need to request the report status.
GET Request Report Status
https://api.similarweb.com/v3/batch/request-status/{{generated_report_id}}
import requests
url = "https://api.similarweb.com/v3/batch/request-status/{{generated_report_id}}"
payload={}
headers = {
'api-key': '{{your_api_key}}'
}
response = requests.request("GET", url, headers=headers, data=payload)
print(response.text)
Example response:
{
"data_points_count": 1779429,
"download_url": "example_url.com",
"status": "completed",
"used_quota": 35589
}
{
"status": "pending"
}
The download link will remain valid for 30 days. We recommend saving these for a certain time period just in case you will need our assistance to troubleshoot any issue that may occur.
Updated 5 months ago