Importing data into Vercel Storage (Postgres) database from AWS S3 Delta Lake #11978

I’m considering using managed Vercel Storage using the Pro plan. I have data in AWS Delta Lake on S3, of which a subset of the data needs to be exported into Vercel Postgres on a daily basis. The data may need to be massaged into a form appropriate for Postgres - e.g. from a single Delta Lake table to multiple Postgres tables. Moreover, the data will likely work best with upserts - as there will be some data that will either need to be updated or inserted. My estimate is approximately 100,000 records from my Delta Lake table will be exported to Vercel Postgres on a daily basis. The data in the Postgres db will be read-only once loaded.

What is the most cost efficient approach for doing this on a daily basis? I’m not too concerned about the performance as long as the data are imported into Vercel Postgres in a few minutes. I’m having trouble finding any documentation or information that gives advice for a situation similar to this.

Should I be considering Vercel Neon instead? If so, what options are available for importing data. Is it listed here and what would you suggest?

Hi, @lsli-yahoocom!

Thanks for your patience :pray:

For your daily data migration from AWS Delta Lake on S3 to Vercel Postgres, a cost-efficient approach would be to AWS S3 event notifications to trigger an AWS Lambda function when your Delta Lake data updates.

This Lambda function should read the relevant data from S3, transform it as needed, and use the Vercel Postgres SDK to perform upserts on your Vercel Postgres database.

This serverless approach is cost-efficient as it only runs when needed, can handle your 100,000 daily records, and allows for necessary data transformations. The upsert process will handle both inserting new records and updating existing ones, meeting your specific requirements.

Vercel Postgres, powered by Neon, is well-suited for this use case and optimized for Vercel. However, you could also consider Vercel Neon directly, as they use the same underlying technology. Both options would support your need for read-only data after import.

To further optimize, batch your upserts, consider compressing data before transfer, and monitor Lambda execution times to adjust memory allocation as needed.