IPFS data caching on AWS

By following these steps, you can create a scalable, serverless architecture that efficiently caches and serves your IPFS-based data, improving the responsiveness and reliability of your site.

Art Quant
10 min readMar 5, 2024
IPFS data caching on AWS
IPFS data caching on AWS, utilizing services like AWS Lambda, Amazon DynamoDB, and Amazon CloudFront

To implement caching for your IPFS-based data to improve the performance of your site, you can follow these steps. The solution involves a serverless architecture to handle requests and cache data efficiently. You can use AWS (Amazon Web Services) as an example platform, utilizing services like AWS Lambda, Amazon DynamoDB, and Amazon CloudFront, but similar services are available on other cloud providers like Azure or Google Cloud.

Step 1: Set up your serverless function (AWS Lambda)

Create an AWS Lambda function:

  • This function will handle requests to your endpoint, check the cache, fetch data from IPFS if needed, and update the cache.

Request Handling: The Lambda function should:

  • Extract the CID from the request URL.
  • Check if the data is available in the DynamoDB table (used as a key-value store).

Setting up a serverless function using AWS Lambda involves several steps. Below is a detailed guide on how to create and configure an AWS Lambda function that will handle requests, interact with IPFS, check cache status, and update your DynamoDB and CDN as necessary.

1. Sign in to the AWS Management Console

First, you need to log in to your AWS account. If you don’t have one, you’ll need to create an account at AWS.

2. Create a New Lambda Function

  1. In the AWS Management Console, navigate to the Lambda service.
  2. Click on the Create function button.
  3. Choose Author from scratch.
  4. Enter a function name, e.g., IPFSCacheHandler.
  5. Select a runtime. For example, if you’re using Python 3.
  6. Choose or create an execution role that grants your function permission to access AWS resources. This role needs permissions to access DynamoDB, S3, and CloudFront.

3. Function Code

  1. You can write your function code directly in the inline code editor in the Lambda console or upload a .zip file containing your code.
  2. Implement the logic to:
  • Parse the CID from the incoming request URL.
  • Check DynamoDB for existing data using the CID as the key.
  • If the cache miss occurs, fetch the JSON from IPFS using the CID, validate it, extract the image CID, fetch the image, and update DynamoDB and the CDN.
  • Respond with the data, either from the cache or freshly fetched.

4. Set Environment Variables

You may need to set environment variables for your Lambda function, such as API endpoints for IPFS, or configuration settings for accessing your DynamoDB table or S3 bucket.

5. Configure Triggers

You’ll need to configure an API Gateway trigger so that your Lambda function can be invoked via HTTP requests:

  • Go to the Designer section in your Lambda function’s configuration.
  • Click on Add trigger.
  • Select API Gateway.
  • Create a new API or use an existing one.
  • Define the security for the API (e.g., open with API key, IAM permissions).
  • The API Gateway will expose an HTTP endpoint that can be used to invoke your Lambda function.

6. Test Your Function

  1. You can test your Lambda function directly in the AWS Console by configuring test events.
  2. Create a test event that mimics the API Gateway request, including a path with a CID.
  3. Execute the test to ensure your function behaves as expected, handling cache logic and interacting with DynamoDB and IPFS appropriately.

7. Deploy and Monitor

  • Once your function is working as expected, deploy it.
  • Monitor its performance and logs via CloudWatch to ensure it’s operating correctly and to troubleshoot any issues.

This setup creates a serverless function that efficiently serves as a middleman between your users and your IPFS data, caching data as needed to improve performance and reduce load times.

Step 2: Set up DynamoDB as a key-value store

Create a DynamoDB table:

  • Use the IPFS CID as the primary key. The table will store the JSON data and metadata about the associated image.

Cache Logic in Lambda:

  • If the data is in DynamoDB, return it immediately.
  • If not, fetch the JSON from IPFS using the CID, validate it, and extract the image CID.

To set up Amazon DynamoDB as a key-value store for your application, follow these steps. DynamoDB will be used to cache your IPFS data, identified by the IPFS CID as the key.

1. Sign in to the AWS Management Console

First, ensure you are logged into your AWS account and navigate to the DynamoDB service dashboard.

2. Create a New Table

  1. Click on Create table.
  2. Enter a Table name, e.g., IPFSCache.
  3. For the Partition key, input cid and set the type to String. This will be the IPFS CID.
  4. You can leave the Sort key empty, as it’s not required for this use case.
  5. Click on Create without adding any additional indexes for now. The table settings can be adjusted based on performance and access patterns later.

3. Configure Table Settings (Optional)

  • Provisioned throughput: If you know your application’s read/write capacity requirements, you can set them; otherwise, you can use the default settings and enable auto-scaling.
  • Encryption: By default, DynamoDB encrypts all data at rest.
  • Tags: Add tags if necessary for organization or billing purposes.

4. Access Control

  • Ensure that your Lambda function has the necessary IAM permissions to access this DynamoDB table. You will need to grant it actions like dynamodb:GetItem, dynamodb:PutItem, and dynamodb:UpdateItem.
  • Create an IAM role for your Lambda function or attach the necessary policies to an existing role.

5. Use the Table in Your Application

  • In your Lambda function, reference this DynamoDB table to read (cache lookup) and write (cache update) operations.
  • When storing data, use the CID as the key for easy retrieval. Store any additional necessary information as attributes in the table items.

Example Python Snippet for DynamoDB Access

Here’s how you might interact with DynamoDB in your Lambda function using Boto3:

import boto3

# Initialize a DynamoDB client
dynamodb = boto3.resource('dynamodb')
# Reference your table
table = dynamodb.Table('IPFSCache')
# Example function to fetch data from DynamoDB
def fetch_from_dynamodb(cid):
try:
response = table.get_item(Key={'cid': cid})
return response.get('Item')
except Exception as e:
print(f"Error fetching item from DynamoDB: {str(e)}")
return None
# Example function to store data in DynamoDB
def store_in_dynamodb(cid, data):
try:
table.put_item(Item={'cid': cid, 'data': data})
except Exception as e:
print(f"Error storing item to DynamoDB: {str(e)}")

This setup establishes DynamoDB as a key-value store for your cached data, leveraging the CID as a unique identifier to facilitate efficient retrieval and update operations.

Step 3: Handle Image Caching and Storage

Fetch and Store Image:

  • If the JSON contains an image CID, fetch the image from IPFS.
  • Upload the image to a CDN (like Amazon CloudFront) for faster delivery. AWS S3 can be used to store the image, and CloudFront can distribute it.

Update DynamoDB:

  • Once the JSON and image are fetched and stored, update the DynamoDB table with this data to serve future requests quickly.

Handling image caching and storage effectively involves retrieving the image from IPFS, storing it in Amazon S3, and using Amazon CloudFront as a CDN to cache and deliver the image efficiently. Here’s how you can implement this step-by-step:

1. Retrieve the Image from IPFS

When your Lambda function retrieves JSON data from IPFS, and it includes an image CID, you’ll need to fetch the image from IPFS. You can do this using an HTTP request to an IPFS gateway or through a direct IPFS node if you have one set up.

2. Store the Image in Amazon S3

  • Create an S3 Bucket: If you don’t already have an S3 bucket, create one in the AWS Management Console. Ensure it’s configured correctly for public access if the images should be publicly accessible.
  • Upload the Image to S3: Modify your Lambda function to upload the retrieved image to your S3 bucket. You’ll use the Boto3 library for this, which requires the s3:PutObject permission.

Example Python code to upload an image to S3:

import boto3

s3 = boto3.client('s3')
def upload_image_to_s3(bucket_name, image_key, image_data):
try:
s3.put_object(Bucket=bucket_name, Key=image_key, Body=image_data)
except Exception as e:
print(f"Error uploading image to S3: {str(e)}")
  • Set Proper Metadata: When uploading the image, ensure you set the correct content-type for the image for proper delivery via the web.

3. Use Amazon CloudFront for Caching and Delivery

  1. Create a CloudFront Distribution: Set up a CloudFront distribution with your S3 bucket as the origin. This step will enable caching and faster delivery of your images globally.
  2. Configure Cache Behavior: Define how CloudFront caches your content. You can specify cache lifetimes, query string parameters, cookies, and more.
  3. Use the CloudFront URL for Image Access: Instead of directly accessing images from S3, use the CloudFront distribution’s URL. This change ensures users get cached content when available, reducing load times and S3 access costs.
  4. Update Your Application: Modify your Lambda function or application logic to reference the CloudFront URL for the image instead of the S3 URL directly.

4. Update the DynamoDB Record (Optional)

Once the image is stored in S3 and available via CloudFront, you might want to update the corresponding DynamoDB record with the CloudFront URL of the image. This update allows your application to directly use the cached version next time without fetching it from IPFS.

Example Python code to update a DynamoDB item:

def update_dynamodb_record(cid, cloudfront_url):
try:
table.update_item(
Key={'cid': cid},
UpdateExpression='SET image_url = :val1',
ExpressionAttributeValues={':val1': cloudfront_url}
)
except Exception as e:
print(f"Error updating DynamoDB record: {str(e)}")

By following these steps, you create an efficient pipeline that caches your IPFS images in S3 and delivers them through CloudFront, significantly improving the performance and user experience of your application.

Step 4: Create an API Gateway

  1. Set up API Gateway: Create an API endpoint that triggers your Lambda function. This is where your users will request data (e.g., https://serverless-example.com/url/QmbMV...).
  2. Cache Configuration: Consider configuring API Gateway caching to further improve response times for frequently accessed data.

To create an API Gateway that triggers your AWS Lambda function, follow these steps. This setup will allow your application to handle HTTP requests and serve the cached data or fetch and cache data when necessary.

1. Create a New API Gateway

  1. In the AWS Management Console, navigate to the API Gateway service.
  2. Click on Create API.
  3. Choose REST API (for this example) and click on Build.
  4. Select New API and give it a name, e.g., IPFSCacheAPI.
  5. Leave the endpoint type as Regional unless you have specific needs for an Edge-optimized API.
  6. Click on Create API.

2. Create a Resource and Method

  1. In the left sidebar, click on Resources.
  2. Select the root resource (/) and click on Actions -> Create Resource.
  3. Enter a resource name and path, e.g., url.
  4. With the new resource selected, click on Actions -> Create Method.
  5. Select GET from the dropdown and click on the checkmark to confirm.
  6. Configure the method:
  • For the integration type, select Lambda Function.
  • Check the box for Use Lambda Proxy integration.
  • Select the region where your Lambda function is located.
  • Type the name of your Lambda function, e.g., IPFSCacheHandler.
  • Click on Save and approve the permission dialog to allow API Gateway to invoke your Lambda function.

3. Deploy the API

  1. With your method configured, click on Actions -> Deploy API.
  2. You’ll need to create a new Deployment stage, e.g., prod.
  3. Enter the stage name and click on Deploy.
  4. After deployment, you will receive an Invoke URL, which looks something like https://[id].execute-api.[region].amazonaws.com/prod.

4. Test the API

  1. You can now test the API by accessing the provided Invoke URL followed by the resource path and parameters, e.g., https://[id].execute-api.[region].amazonaws.com/prod/url/{cid}.
  2. Replace {cid} with an actual IPFS CID to test fetching and caching the data.

5. Monitor and Debug

  • Use the monitoring features in API Gateway and AWS Lambda to track requests and errors.
  • Check the logs in Amazon CloudWatch for detailed information if you encounter issues.

6. Secure Your API

  • Consider using API keys, IAM roles, or Lambda authorizers to secure your API, depending on your application’s needs.

By following these steps, you’ve created a RESTful API with AWS API Gateway that triggers your Lambda function to fetch and cache data from IPFS, allowing for efficient data retrieval and improved performance for your application.

Step 5: Implement the Lambda Function Logic

  • The Lambda function needs to handle the following:
  • Extracting the CID from the request.
  • Checking and retrieving data from DynamoDB.
  • If necessary, fetching data from IPFS and storing it in DynamoDB and CDN.
  • Returning the cached data or newly fetched data to the user.

To implement the Lambda function logic, you’ll need to develop a Python script that handles the incoming requests, checks for cached data in DynamoDB, fetches data from IPFS if necessary, stores images in S3, updates DynamoDB, and responds appropriately. Below is a comprehensive example that integrates the steps discussed previously:

Lambda Function Logic Implementation

import boto3
import json
import requests

# Initialize AWS clients
dynamodb = boto3.resource('dynamodb')
s3 = boto3.client('s3')
table = dynamodb.Table('IPFSCache') # Replace with your table name
s3_bucket_name = 'your-s3-bucket-name' # Replace with your S3 bucket name
def lambda_handler(event, context):
# Extract the CID from the path parameter
cid = event['pathParameters']['cid']
# Check for data in DynamoDB
cache_data = fetch_from_dynamodb(cid)
if cache_data:
return respond(200, cache_data)
# Fetch data from IPFS
ipfs_data, image_cid = fetch_data_from_ipfs(cid)
# Save image to S3 and get its URL
if image_cid:
image_url = save_image_to_s3(image_cid)
# Update DynamoDB with new data
store_in_dynamodb(cid, ipfs_data, image_url)
# Return the fetched data
return respond(200, ipfs_data)
def fetch_from_dynamodb(cid):
try:
response = table.get_item(Key={'cid': cid})
return response.get('Item')
except Exception as e:
print(f"Error fetching item from DynamoDB: {str(e)}")
return None
def fetch_data_from_ipfs(cid):
ipfs_response = requests.get(f'https://ipfs.io/ipfs/{cid}')
ipfs_data = ipfs_response.json()

# Extract the image CID; adjust this based on your data structure
image_cid = ipfs_data.get('image_cid')

return ipfs_data, image_cid
def save_image_to_s3(image_cid):
image_response = requests.get(f'https://ipfs.io/ipfs/{image_cid}', stream=True)
image_key = f"images/{image_cid}"

s3.upload_fileobj(image_response.raw, s3_bucket_name, image_key)
# Generate the S3 URL; consider using CloudFront URL instead
image_url = f"https://{s3_bucket_name}.s3.amazonaws.com/{image_key}"
return image_url
def store_in_dynamodb(cid, data, image_url=None):
item = {'cid': cid, 'data': data}
if image_url:
item['image_url'] = image_url
try:
table.put_item(Item=item)
except Exception as e:
print(f"Error storing item to DynamoDB: {str(e)}")
def respond(status_code, data):
return {
'statusCode': status_code,
'body': json.dumps(data)
}

Key Points of the Lambda Function

  • Data Fetching: The function checks DynamoDB for cached data. If not found, it fetches from IPFS.
  • Data Parsing: Customize the data parsing from IPFS to match your JSON structure, especially for extracting image CIDs.
  • Image Handling: Images are fetched from IPFS and stored in S3, and the S3 URL is stored in DynamoDB.
  • DynamoDB Updates: New data is cached in DynamoDB, including the reference to the stored image.
  • Response: The function returns the data, either from the cache or freshly fetched from IPFS.

This comprehensive Lambda function serves as the backend logic for your serverless application, efficiently handling requests to fetch and cache data from IPFS, and ensuring quick data retrieval for your users.

Services to Use:

  • AWS Lambda: For running serverless functions.
  • Amazon DynamoDB: For key-value data storage.
  • Amazon S3: For storing images.
  • Amazon CloudFront: For CDN capabilities.
  • API Gateway: For creating RESTful endpoints.

Implementation Notes:

  • Security: Ensure your Lambda function has the necessary IAM roles and permissions to access DynamoDB, S3, and CloudFront.
  • Monitoring and Logging: Utilize AWS CloudWatch for monitoring and logging the performance and activities of your Lambda functions.
  • Cost Optimization: Keep an eye on the number of requests, data transfer, and storage to manage costs effectively.

--

--