Set up your Batch API
Step by step guide to set up your Similarweb Batch API
Welcome to Similarweb's Batch API - giving you scalable access to the world's largest digital measure database!
Get Similarweb data for more than 1,000,000 domains and 5 years of history, tens of metrics - in one API call!
This guide has 2 quick steps to get millions of data points from our API.
Get started checklist:
- 🔑 Get an API Key tutorial OR if you already Batch API access generate API Key here
- 📈 Choose the data and metrics you need based on your subscription datahub.similarweb.com Or discover new datasets here here
- 📝 Create a request report with a valid JSON
- 🔗 Connect and integrate to your Data lake (S3, Snowflake, Databricks)
Step-by-step guide:
- Make a POST request with a JSON in the body or attached as a file as multipart/form-data
https://api.similarweb.com/batch/v4/request-report
import requests
url = "https://api.similarweb.com/batch/v4/request-report"
payload={}
files=[
('request',('Batchexample.json',open('/Users/Batchexample.json','rb'),'application/json'))
]
headers = {
'api-key': '{{your_api_key}}'
}
response = requests.request("POST", url, headers=headers, data=payload, files=files)
print(response.text)
Example JSON:
{
"delivery_information":
{
"response_format": "csv"
},
"report_query": {
"tables": [
{
"vtable": "traffic_and_engagement",
"granularity": "monthly",
"filters": {
"domains": [
"similarweb.com",
"api.similarweb.com"
],
"countries": [
"WW",
"US"
],
"include_subdomains": true
},
"metrics": [
"all_traffic_visits",
"desktop_new_visitors",
"desktop_pages_per_visit",
"desktop_returning_visitors"
],
"start_date": "2023-02",
"end_date": "2024-02"
}
]
}
}
When requesting a report you must include in the JSON the following parameters.
Mandatory Parameters:
Parameters | Description | Acceptable Values |
---|---|---|
vtable | This represent the data set you are looking to choose metrics form you can find the full list on datahub.similarweb.com or here | traffic_and_engagement |
domains | Characters in domain names can include letters, numbers, dashes, and hyphens. One request can include up to 1M domains. | amazon.com |
countries | Countries with standard 2-letter ISO encoding when calling all metrics (excluding desktop_top_geo). For worldwide, use "WW". This parameter is case-sensitive and must be inputted in capital letters. When calling desktop_top_geo, you must remove any countries from your JSON file. | WW, US, GB All country codes |
metrics | List of metrics per dataset | all_traffic_visits |
start_date, end_date | For daily granularity, format the start-and-end date like this: YYYY-MM-DD. For monthly granularity, format the start-and-end date like this- YYYY-MM | Daily: 2023-06-30 Monthly: 2023-06 |
granularity | Time series granularity | monthly, weekly, daily |
response_format | Output of the API call | JSON, csv, parquet, orc |
Make sure to save the report ID you receive after your API request.
The request limit per user is 100 pending requests. if you receive a '429' error it means you've exceeded the limit of allowed pending requests. Reduce the frequency of your requests to stay within the limits of your account.
Optional Parameters:
Parameter | Description | Acceptable values |
---|---|---|
delivery_method | The default Value is "download_link". When the delivery method is set to “snowflake”, the “response_format” field is not required | download_link, bucket_access, snowflake |
delivery_method_params | Use this when requesting reports to be delivered to aggregated Snowflake tables. Input “table_name”: “your_table_name”. See set-up guide for more details. | table_name, integration_name, retention_days, overwrite_partitions |
all_history | Boolean, when set to true, will automatically override the dates to the minimum start date and maximum end date, valid values true or false, default is false. | true/false |
latest | Boolean, when set to true will override the end date with the latest available date, if the start date is not specified it will also override the start date with the same. | true/false |
window_size | String, when set will override the start date with a time relative to the end date. | Should be in the format - {number}{y/m/d}, for example - '12d', '3m', or '2y'. |
limit | Integer, Limits the number of results per entity selected. | above 0, most metrics default is 100 |
Include_subdomains | Boolean, Default is true. | true/false |
webhook_url | Enter the delivery URL you'd like us to ping when the status of your report changes. | URL |
sort | Allows you to sort by a specific metric | specific metrics: "sort": "all_traffic_visits" |
- After you made your request and got your report ID, use the Request Report Status to receive the report status.
Upon completion, you will need to request the report status.
GET Request Report Status
https://api.similarweb.com/v3/batch/request-status/{{generated_report_id}}
import requests
url = "https://api.similarweb.com/v3/batch/request-status/{{generated_report_id}}"
payload={}
headers = {
'api-key': '{{your_api_key}}'
}
response = requests.request("GET", url, headers=headers, data=payload)
print(response.text)
Example response:
{
"data_points_count": 1779429,
"download_url": "example_url.com",
"status": "completed",
"used_quota": 35589
}
{
"status": "pending"
}
Date Credits cost per request:
The download link will remain valid for 30 days. We recommend saving these for a certain time period just in case you will need our assistance to troubleshoot any issue that may occur.
Data credits are calculated for each report based on the number of results you are actually receiving:
Formula: Number of domains X Number of metrics X history X cadence (daily/monthly) X Number of countries X Number of results
In order to calculate the estimated credits the report will cost, you can use the "request-validate" endpoint
Updated 4 months ago