I first found out about Cube.js from @shortjared back around February of this year when he was testing it out. I needed to change the way we queried and displayed analytics internally and externally. On occasion I’ll be asked to run a report for something we don’t have natively in our dashboards. This usually meant building SQL queries, running them, and exporting as csv. This is not an ideal scenario and it takes quite some time. Unfortunately I was unable implement anything until now!
Cube.js has a wide variety of supported drivers, the ability to setup pre-aggregations, great guides/tutorials, and an active Slack community. It’s simple to get started and introduce in a serverless way so I wanted to give it a shot.
Why Write A New Driver
Early this year I migrated our RDS MySQL database to Aurora Serverless. I’ve been systematically migrating any non-serverless service (EC2 instances, RDS, etc) to serverless counterparts. Our stack is now mostly comprised of AppSync, DynamoDB, Lambda functions, Cloudfront and S3 instead of EC2 and RDS. However, to facilitate the transition all my functions were running in a vpc in order to access the database since the ec2 instances were not using the data api (and honestly when I started I wasn’t sure if I would use it in production).
In November I began the migration to use the data api instead of having my functions residing in the vpc. This has had its own set of challenges but also meant that to use Cube.js I’d need to have a driver capable of using the data api.
Writing The Driver
Contributing a driver to Cube.js was actually quite easy! The mysql-aurora-serverless-driver ended up only being around 150 lines of code. Since Cube.js is very flexible I just had to emulate the current mysql driver bindings, pop in @jeremy_daly’s data-api-client, and setup the configuration options. The Cube.js core handles the rest for you.
The hardest part was configuring the tests and network connections for the test-containers. I used the local-data-api connected to a mysql container to get the desired data api effect.
Where To Go From Here
Currently to use Cube.js in production mode you have to have a redis instance to connect to. AWS Elasticache can not be hit outside of the vpc (requiring functions to be in the vpc once again) so I started building a DynamoDB cache and queue driver. It is very experimental at the moment (unreleased) but can be built and added to the server-core.
The RDS Data Api seems to have a hard limit of 45 seconds for a query timeout. If a single query (or single query in a transaction) lasts longer than 45 seconds it will timeout. As far as I can tell even setting the httpOptions does not override this. Perhaps Aurora Serverless v2 will improve this functionality.