
I'm curious to hear ideas (based on any assumptions), and what's the best architecture going forward so that: I'm looking to migrate all my DBs to PostgreSQL on AWS RDS and manage everything this way.

Right now, I feel I'm over-complicating my workflows, having to SSH-download files to keep states aligned between a remote and my localhost. I could not solve this, so realized from StackOverflow threads, that most users use the paramiko library to SSH directly into their EC2 instance. It seems Python is locked inside my virtual environment, and cannot find ssh.exe within my C:/windows/system32/OpenSSH. But, I'm having troubles wrapping these commands in Python using os or subprocess, and either run the command directly or run a batch file with these commands. I've been experimenting with scp -i parameters in my cmd console, and this works fine, as I'm able to retrieve my files. So, I'm trying to figure out, whenever I'm running my scraper locally, how to ensure I have the latest state of DB files I can build on - effectively writing a function to download the latest DB files from my EC2 instance.

I would like to run my scraper from my local machine too, for testing purposes. Say, the state of these DB files is quite important as they are then used as the source to compile API calls and maintain a website. At each run, my scraper saves a set of results to a relative local folder such as ( db//db1.csv, db//db2.csv, etc). I've set up cronjobs to run my scraper at scheduled intervals.

So, I spun up an EC2 instance and dumped my scraper I had built in Python. So any gentle hints are much appreciated :)
