Checksum ensure the integrity of file. In AWS S3, it also using eTag to do similar things. It create MD5 value and store in eTag. However, it is not a real checksum for s3 object itself. It incudes s3 object metadata. As result, it might not the same if download file and generate checksum by own.
To create a real file checksum, it can be done by Lambda trigger. With lambda function, it can create trigger to execute task when file uploaded. This demo, it it will use Java to create lambda trigger to create MD5 and SHA-256 checksum.
Pre-requests
- AWS account should has sufficient privilege to create lambda function and list s3 buckets;
- User account for execute Lambda function should be configurated;
Steps
- Create Lambda function.
In AWS Management Console, click Create Function, input Function name and select Runtime to Java 11 (Corretto), then click Create Function.
- Add Trigger
Open created trigger, click Add trigger.
- Configure new trigger
In Add trigger menu, select S3 as trigger, then select target bucket in Bucket dropdown. Optionally, it can add s3 object key prefix and suffix in Prefix and suffix textbox.
After tick Recursive invocation consent, click Add.
- Create gradle library project
In command prompt / terminal create project with gradle command as below.gradle init
- Add dependency.
In build.gradle, add line below to setup dependency and build task.dependencies { implementation 'com.amazonaws:aws-lambda-java-core:1.2.1' implementation 'com.amazonaws:aws-lambda-java-events:3.9.0' runtimeOnly 'com.amazonaws:aws-lambda-java-log4j2:1.2.0' implementation 'software.amazon.awssdk:s3:2.17.9' // Use JUnit Jupiter for testing. testImplementation 'org.junit.jupiter:junit-jupiter:5.7.1' } task("buildZip", type: Zip) { from compileJava from processResources into("lib") { from configurations.runtimeClasspath } } test { useJUnitPlatform() } java { sourceCompatibility = JavaVersion.VERSION_11 targetCompatibility = JavaVersion.VERSION_11 } build.dependsOn buildZip
- Add handler.
Create new class file named GenerateChecksumEventHandler and add code below.import com.amazonaws.services.lambda.runtime.Context; import com.amazonaws.services.lambda.runtime.LambdaLogger; import com.amazonaws.services.lambda.runtime.RequestHandler; import com.amazonaws.services.lambda.runtime.events.models.s3.S3EventNotification; import software.amazon.awssdk.auth.credentials.AwsBasicCredentials; import software.amazon.awssdk.auth.credentials.StaticCredentialsProvider; import software.amazon.awssdk.core.ResponseBytes; import software.amazon.awssdk.core.sync.RequestBody; import software.amazon.awssdk.services.s3.S3Client; import software.amazon.awssdk.services.s3.model.GetObjectRequest; import software.amazon.awssdk.services.s3.model.GetObjectResponse; import software.amazon.awssdk.services.s3.model.PutObjectRequest; import software.amazon.awssdk.services.s3.model.PutObjectResponse; import java.io.ByteArrayInputStream; import java.io.IOException; import java.io.InputStream; import java.security.DigestInputStream; import java.security.MessageDigest; import java.security.NoSuchAlgorithmException; import java.util.Arrays; public class ChecksumEventHandler implements RequestHandler<S3EventNotification, Void> { private static final String ENVIRONMENT_VARIABLE_ACCESS_KEY_ID="ACCESS_KEY_ID"; private static final String ENVIRONMENT_VARIABLE_SECRET_ACCESS_KEY=SECRET_ACCESS_KEY"; @Override public Void handleRequest(S3EventNotification input, Context context) { // Initial settings. LambdaLogger logger = context.getLogger(); String s3AccessKeyId = System.getenv(ENVIRONMENT_VARIABLE_L2_REMOTE_ACCESS_KEY_ID); String s3SecretAccessKey = System.getenv(ENVIRONMENT_VARIABLE_L2_REMOTE_SECRET_ACCESS_KEY); // Connect to s3 with specific access key and secret access key which store in environment variable. AwsBasicCredentials awsCredentials = AwsBasicCredentials.create( s3AccessKeyId, s3SecretAccessKey); S3Client s3Client = S3Client.builder().credentialsProvider(StaticCredentialsProvider.create(awsCredentials)).build(); // Loop with uploaded files. input.getRecords().forEach(o-> { // Get related s3 object metadata and content. String bucketName = o.getS3().getBucket().getName(); String objectKey = o.getS3().getObject().getKey(); GetObjectRequest getObjectRequest = GetObjectRequest.builder().bucket(bucketName).key(objectKey).build(); ResponseBytes<GetObjectResponse> response = s3Client.getObjectAsBytes(getObjectRequest); InputStream objectByte = new ByteArrayInputStream(response.asByteArray()); // Generate checksum file and store in same bucket. generateChecksumFile(logger, bucketName, objectKey, objectByte, "MD5", "md5", s3Client); generateChecksumFile(logger, bucketName, objectKey, objectByte, "SHA-256", "sha256", s3Client); }); return null; } /** * Generate checksum with specific algorithm. * @param logger injected logger. * @param bucketName s3 bucket name. * @param objectKey s3 object key name. * @param inputStream s3 object input stream. * @param algorithm checksum algorithm required. * @param fileExtension checksum file extension. * @param s3Client injected s3 client. */ private void generateChecksumFile(LambdaLogger logger, String bucketName, String objectKey, InputStream inputStream, String algorithm, String fileExtension, S3Client s3Client) { try { inputStream.reset(); String checksum = generateChecksum(inputStream, algorithm); logger.log(algorithm+ " checksum = "+ checksum); String checksumObjectName = objectKey+ "." +fileExtension; RequestBody s3ObjectBody = RequestBody.fromString(checksum); PutObjectRequest putObjectRequest = PutObjectRequest.builder().bucket(bucketName).key(checksumObjectName).build(); PutObjectResponse putObjectResponse = s3Client.putObject(putObjectRequest, s3ObjectBody); logger.log("Checksum stored and upload to bucket "+bucketName+"; path = "+ checksumObjectName+ ", e-tag = "+putObjectResponse.eTag()+"."); } catch (NoSuchAlgorithmException | IOException e) { logger.log(e.getMessage()); logger.log(Arrays.toString(e.getStackTrace())); } } /** * Generate checksum with specific algorithm. * @param inputStream object stream to be hash. * @param algorithm checksum algorithm. * @return Checksum in string format. * @throws NoSuchAlgorithmException Invalid input algorithm. * @throws IOException Exception when read / write byte. */ private String generateChecksum(InputStream inputStream, String algorithm) throws NoSuchAlgorithmException, IOException { MessageDigest messageDigest = MessageDigest.getInstance(algorithm); DigestInputStream digestInputStream = new DigestInputStream(inputStream, messageDigest); byte[] buffer = new byte[4096]; int count = 0; while (digestInputStream.read(buffer) > -1) { count++; } MessageDigest digest = digestInputStream.getMessageDigest(); digestInputStream.close(); byte[] checksum = digest.digest(); StringBuilder sb = new StringBuilder(); for (byte b : checksum) { sb.append(String.format("%02X", b)); } return sb.toString().toUpperCase(); } }
As it will generate file and upload to s3 bucket, so it is required to use AWS SDK and put object permission.
- Build distributed file.
In command prompt / terminal, execute command below.gradle clean buildZip
- Upload file.
In Lambda function, select Code tab, click Upload From > zip of jar file, select built file and upload it.
- Add credential.
In lambda function, select Configuration > Environment variables, click Edit, then add parameter below and set access key and secret key.
- Create Test
In lambda function, select Test tab, click new event and input event name, then input json below. Beware bucket name, arn and object key should align with triggered s3 bucket and its object.
-
{ "Records": [ { "eventVersion": "2.0", "eventSource": "aws:s3", "awsRegion": "ap-southeast-1", "eventTime": "1970-01-01T00:00:00.000Z", "eventName": "ObjectCreated:Put", "requestParameters": { "sourceIPAddress": "127.0.0.1" }, "responseElements": { "x-amz-request-id": "EXAMPLE123456789", "x-amz-id-2": "EXAMPLE123/5678abcdefghijklambdaisawesome/mnopqrstuvwxyzABCDEFGH" }, "s3": { "bucket": { "name": "test-bucket", "arn": "arn:aws:s3:::test-bucket" }, "object": { "key": "test.zip" } } } ] }
- Execute test.
Click Test to execute test, expected it will result success.
Leave a Reply