In today’s fast-paced digital world, businesses often need to schedule and execute tasks asynchronously, such as sending notifications, emails, or processing data at specific times. Building a scalable, reliable, and cost-effective task scheduler presents unique challenges, especially when managing time zones and high event volumes.
In this article, I’ll walk you through how we solved this problem using AWS serverless services, including API Gateway,Lambda,Step Functions,Redis, DynamoDB, EventBridge, SES and SQS. I’ll also share the cost analysis of the system and the results achieved after implementation.
The Problem: Scheduling Events Across Time Zones
At MyDay, we faced a significant challenge: scheduling and executing time-sensitive events (like notifications and emails) for users across multiple time zones. For example, a user in the UK should receive an email at 9 PM local time, while a user in Pakistan should receive the same email at 9 PM local time. Previously, we relied on multiple cron jobs, which not only strained our infrastructure but also introduced maintenance headaches. These cron jobs were tightly coupled with several functions integrated across multiple places, making the system error-prone and difficult to maintain. Modifying or updating any part of the system was a headache, as changes in one function often broke others. When issues arose, it required manual, time-consuming efforts to trace and fix them, leading to delays and frustration.
The system had to meet the following requirements:
- Time Zone Awareness: Events must be triggered at the correct local time for each user.
- Scalability: Handle millions of users and events without performance degradation.
- Reliability: Ensure events are executed reliably, even in the face of failures.
- Cost-Effectiveness: Minimize infrastructure costs while maintaining high performance.
The Solution: A Serverless Event Scheduler on AWS
To solve this, we implemented a serverless event scheduler using AWS. By leveraging API Gateway,Lambdas,Step functions, DynamoDB Streams,Redis and EventBridge, we built a decoupled, scalable, and cost-effective solution that ensures events are triggered at the correct local time for each user. This revamped system seamlessly handles millions of events while slashing costs to just — million—offering both scalability and efficiency. Now, troubleshooting is streamlined, and updates can be made without risking system-wide failures.
Here’s how it works:
1. Event Submission (API Gateway + Lambda)
- AWS Component 1: API Gateway
- Users or systems submit events (jobs) to the system via an API endpoint exposed by Amazon API Gateway.
- API Gateway handles HTTP requests and routes them to the appropriate backend service.
- AWS Component 2: Lambda Function
- API Gateway triggers an AWS Lambda function to process the incoming event.
- The Lambda function validates the event, performs any necessary transformations, and prepares it for scheduling.
- The Lambda function also converts the local time to UTC based on the user’s time zone.
- Example:
9 PM local timein the UK becomes8 PM UTC+00during daylight saving time.
2. Store User Data with Time Zone Information
- AWS Service: DynamoDB
- The validated event is also stored in Amazon DynamoDB, which acts as the event store.
- DynamoDB persists the event details, including the scheduled time, job metadata, and status.
3. Event Polling (DynamoDB Streams + Lambda)
- AWS Component 4: DynamoDB Streams
- When a new event is written to DynamoDB, DynamoDB Streams captures the change (insert, update, or delete) in real-time.
- DynamoDB Streams acts as a change data capture (CDC) mechanism, enabling the system to react to new events immediately.
- AWS Component 5: Lambda Function (Triggered by DynamoDB Streams)
- A Lambda function is triggered by DynamoDB Streams whenever a new event is added to the table.
- This Lambda function processes the event and determines if it needs to be scheduled immediately (if required time is less than ) or at a future time.
4. Schedule Events Using EventBridge
- During planning we pondered over multiple ways to schedule events. Below, we compare two approaches:
- Creating separate EventBridge rules for different time zones
- Using a Global Secondary Index (GSI) in DynamoDB and a uniform UTC-based cron job with Redis batch processing (Recommended)
Approach 1: Create EventBridge Rules for Separate Time Zones
AWS Service: Amazon EventBridge + Lambda
In this approach, EventBridge rules are created per time zone to schedule events at the calculated UTC time.
For example, a rule is created to trigger a Lambda function at 8 PM UTC for UK users and 3 PM UTC for Canada users.How It Works?
- Instead of using a fixed cron job, separate EventBridge rules are created per time zone.
- When the event triggers, it executes a Lambda function that processes notifications for users in that time zone.
Example Setup:
- Rule 1: Runs every day at 9 AM UK time (UTC+0/UTC+1 in DST).
- Rule 2: Runs every day at 6 PM PK time (UTC+5).
- Rule 3: Runs every day at 2 PM Canada time (UTC-5/UTC-4 in DST).
✅ Pros:
✔ Accurate Scheduling → Events run exactly at the local time for each region.
✔ No Need for Additional Filtering → Since each rule is timezone-specific, the Lambda function does not need to filter users.❌ Cons:
⚠ 300 EventBridge Rule Limit → AWS allows 300 rules per account. If there are too many unique time slots, this approach can become unmanageable.
⚠ Difficult Scaling → If new time slots are added, new EventBridge rules must be created dynamically.
⚠ Higher AWS Costs → Each EventBridge rule triggers a Lambda function separately, increasing execution cost.Approach 2: Use a Single EventBridge Rule with GSI, Redis & Batch Processing (Recommended)
AWS Service: Amazon EventBridge + DynamoDB (GSI) + Redis Batch Processing
Instead of creating separate EventBridge rules for each time zone, we store all events in DynamoDB with UTC timestamps. A single EventBridge cron job runs every 15 minutes, fetching events that need to be executed within the next 15 minutes.
How It Works?
- DynamoDB stores events with a Global Secondary Index (GSI) on
executionTimeUTC(Partition Key:executionDate, Sort Key:executionTimeUTC) -
A single EventBridge rule triggers every 15 minutes, regardless of time zone.
-
Lambda fetches scheduled events from DynamoDB and stores them in Redis for quick lookup.
- Batch Processing with Redis: Events are grouped into batches based on their type (e.g., email notifications, push notifications, data processing jobs). After every 15-minute interval, all events are fetched from Redis and executed in parallel, ensuring efficient processing without unnecessary delays.
-
After execution, events are marked as “Success” in DynamoDB.
- The process repeats, fetching the next batch of events every 15 minutes.
✅ Pros:
✔ Efficient Scaling → Supports unlimited time zones and schedules with just one EventBridge rule.
✔ Lower Costs → Fewer EventBridge rules and fewer Lambda invocations.
✔ Faster Execution with Redis Batch Processing → Instead of fetching each event individually, Redis allows batch retrieval and execution, reducing delays.
✔ Easy to Manage → No need to dynamically create or delete EventBridge rules.❌ Cons:
⚠ Requires Redis → Adds an extra caching layer, but it’s necessary for performance.
⚠ Lambda Execution Time May Increase → If too many events are scheduled in a 15-minute window, Lambda processing may take longer.
5. Execute Events
- AWS Services: AWS Step Functions, Lambda, Amazon SES, Firebase Cloud Messaging (FCM), Redis, DynamoDB
-
Instead of using a single Lambda function for all event processing, we now use AWS Step Functions to orchestrate event execution. This prevents Lambda exhaustion, improves scalability, and ensures efficient event handling.
Workflow:
- Event Trigger: EventBridge triggers the Step Function every 15 minutes to process scheduled events.
- Fetch Events from Redis: A Lambda function fetches events scheduled for the next 15 minutes from Redis.
- If Redis is empty (on first execution or after completion of a batch), another function retrieves the next batch of events from DynamoDB and stores them in Redis.
- Determine Event Type: Step Functions determine the type of event and direct it to the appropriate execution path.
Execution Paths:
- Email Notifications:
- Fetch email details from Amazon DocumentDB based on category.
- Dynamically compose email content.
- Send the email using Amazon SES.
- Push Notifications:
- Fetch notification details with dynamic text, titles, and payloads.
- Call internal notification scripts to construct and send push notifications via Firebase Cloud Messaging (FCM).
- Internal Script Execution:
- Execute custom business logic scripts if the event requires internal processing.
Post-Execution:
- Once an event is processed, its status is updated in DynamoDB with execution details.
- Successfully executed events are removed from Redis.
- The process repeats, ensuring that the next batch of scheduled events is always ready for execution.
Benefits:
- ✅ Improved scalability and efficiency with Step Functions.
- ✅ Reduced Lambda execution time by breaking execution into separate steps.
- ✅ Reliable event scheduling using Redis as a temporary store.
- ✅ Automatic Retries: Step Functions automatically retries failed executions. failed executions can also send to an SQS Dead Letter Queue (DLQ) for manual review.
- ✅ Cost-effective processing by avoiding Lambda overuse and enabling batch execution.
6. Handle Failures with a Dead Letter Queue (DLQ)
- AWS Service: Amazon SQS
- If Lambda fails to process an event after all retries (by default, 2 retries for asynchronous invocations), it automatically sends the event to the configured DLQ.
- This helps in debugging and retrying failed executions.
7. Monitor the System with CloudWatch
- AWS Service: Amazon CloudWatch
- CloudWatch is used to monitor the system, track Lambda invocations, and log errors.
- Alarms are set up to notify the team when failures occur.
The Results: Scalability, Reliability, and Cost Savings
After implementing the system, we achieved the following results:
1. Scalability
- The system can handle millions of events across multiple time zones without performance degradation.
- DynamoDB and EventBridge scale automatically to meet demand.
2. Reliability
- Events are executed reliably at the correct local time for each user.
- The DLQ ensures that failed events are not lost and can be retried or analyzed.
3. Cost-Effectiveness
- The system leverages serverless services, which are highly cost-effective for unpredictable workloads.
- Here’s the cost breakdown :
- API Gateway: $1.00/1 million events
- Lambda: $1.24/1 million events
- DynamoDB: $1.75/1 million events
- DynamoDB Streams: $0.20/1 million events
- EventBridge: $2.00/1 million events
- SES : $100/1 million events
- Total Cost: ≈$106 per million events.
4. Improved Developer Productivity
- The serverless architecture reduced the operational overhead, allowing the team to focus on building features rather than managing infrastructure.
Key Observations
- The most expensive component is SES, contributing ~94% of the total cost.
- The serverless components (API Gateway, Lambda, DynamoDB, EventBridge, and SQS) are extremely cost-effective.
- The system is highly scalable and can handle 10 million events for just ≈$106
Conclusion
By leveraging AWS serverless services, we built a scalable, reliable, and cost-effective event scheduler that meets the needs of MyDay. The system handles millions of events across multiple time zones, ensures reliable execution, and minimizes infrastructure costs.
If you’re facing similar challenges, I highly recommend exploring AWS serverless services. They provide the flexibility, scalability, and cost-effectiveness needed to build modern applications.
If you’re tackling similar challenges, let’s connect! I’m always open to discussing scalable software solutions—reach out on LinkedIn, and let’s build something amazing together.
