Amazon S3 Integration

Access your Similarweb Batch API reports via Amazon S3.

Access your Batch API reports directly from an Amazon S3 bucket maintained by Similarweb, so you can easily process your custom reports and automatically integrate them into your own database systems.

🚧

Compatible with Similarweb Batch API only

To get access, speak to a Similarweb representative.

Setup instructions

To access your reports securely via the S3 bucket, use the ‘Get-S3-Credentials’ endpoint to retrieve a personalized AWS access key.

Endpoint-

[GET] https://api.similarweb.com/v3/batch/s3-credentials

Response - 200 OK

Example flow:

  1. Use the Get-S3-Credentials endpoint to generate AWS credentials authorized with READ access to all files under “s3_bucket“ and “s3_prefix”.

Note: It is crucial to save the AWS keys as you will not be able to generate them again.

  1. Request a report via the Request Report endpoint, use "delivery_method": "bucket_access". Take note of the generated “report_id”.

  2. Once the report is ready, use the Request Report Status endpoint. Instead of the “download_url” link, use the “S3_path” link.

  1. Use the AWS credentials using any AWS client (e.g. awscli, boto3, etc).

Your report will be ready for you under the {s3_bucket}/{s3_prefix}/{report_id} folder.
For example, s3://web-bulk-api-reports-production-us-east-1/123414/070f4e79-5248-487e-a43ee51309610c

  1. List the files under the report path and use them as you wish.

🚧

S3 credentials expire after 1 year

Once expired, revoke the connection using the Revoke S3 endpoint, then repeat the set-up instructions above to renew your credentials.

FAQs

What if I have multiple report files?

The Batch API splits the report into multiple files based on the primary key of the requested data, as detailed in the Batch API documentation.

For example, the primary key for the ‘desktop_visits’ metric is (‘site’,’country’), and for the‘desktop_top_geo’ metric it’s (‘site’).

Requesting both of these in the same request will result in 2 report folders.

  • s3://web-bulk-api-reports-production-us-east-1/123414/070f4e79-5248-487e-a43e-62e51309610c/keys=site-country

  • s3://web-bulk-api-reports-production-us-east-1/123414/070f4e79-5248-487e-a43e-62e51309610c/keys=site

Which files should be ignored?

Ignore any files that start with ‘_’ (underscore) letter.

Python Example:

📘

Need help?

If you are experiencing any issues with your AWS credentials or would like to speak to one of our dedicated technical API specialists, please reach out to your Account Manager.

Requesting reports: First time reports and defining your report structure - DataBricks compatible

When requesting a Similarweb Batch API report, state where you’d like the data to be delivered via the create-report table. Add the full schema and report query parameters.

Using the create-table endpoint:

https://api.similarweb.com/batch/v4/integration/create-table

Add the delivery information to the body, example:

{
    "delivery_information": {
        "delivery_method": "bucket_access",
        "delivery_method_params": {
            "table_name": "{{table_name}}"
        }
    },
    "report_query": {
        "tables": [
            {
                "vtable": "traffic_and_engagement",
                "granularity": "monthly",
                "all_history": true,
                "filters": {
                    "countries": [
                        "WW", "US"
                    ],
                    "include_subdomains": true,
                    "domains":[

                            "amazon.com",
                            "google.com"
                    ]
                },
                "metrics": [
                        "desktop_new_visitors",
                        "desktop_returning_visitors"
                ],
                "paging": {
                    "limit": 10000000
                }
            } 
        ]
    }
}

Note: When the delivery method is set to “bucket_access”, the “response_format” field is not required.

You should receive “status”: “pending”

Step 2

Use the request report status endpoint to verify the report is complete (“status”: “completed”). The table names will appear in the response (“path”).

Requesting reports - Aggregated recurring tables

To fully automate your analysis, you can request reports to be delivered to pre-defined tables within your S3 account. This will allow you to have fresh data sent directly to the same tables, rather than having a new table created each time you request a report. Configure your tables with Tableau, Looker, and other data visualization tools.

Step 1:

After defining the table, you can use the request-report endpoint.

The schema cannot be changed from the original table. You can change the filters: domains, countries and dates in order to update the latest data.

POST <https://api.similarweb.com/batch/v4/request-report


{
    "delivery_information": {
        "delivery_method": "bucket_access",
        "delivery_method_params": {
            "table_name": "{{table_name}}"
        }
    },
    "report_query": {
        "tables": [
            {
                "vtable": "traffic_and_engagement",
                "granularity": "monthly",
                "all_history": true,
                "filters": {
                    "countries": [
                        "WW", "US"
                    ],
                    "include_subdomains": true,
                    "domains":[

                            "amazon.com",
                            "google.com"
                    ]
                },
                "metrics": [
                        "desktop_new_visitors",
                        "desktop_returning_visitors"
                ],
                "paging": {
                    "limit": 10000000
                }
            } 
        ]
    }
}