Tutorial Pull Tweeter Hashtags ============================== In this tutorial we are going to pull the data about popularity of several different topics in the hash tags of tweets. Imagine that we have already classified some popular hashtags to several specific groups. Now we want to pull the data about their usage in the last day. Every day we would want to read the data only for the last 3 previous days, so we add the chunking parameters: ``"period": "last_2_days", "isolate_days": true`` We shall pretend that pulling the data for every keyword takes several minutes. In this case we add another parameter: ``“isolate_words”: true`` to make sure that each word shall be processed in an different Lambda execution. The following payload shall become our Job. This means that we shall chunk it per specific word / day combination and create independent tasks for the Worker Lambda for each chunk. .. code-block:: json { "topics": { "cars": { "words": ["toyota", "mazda", "nissan", "racing", "automobile", "car"] }, "food": { "words": ["recipe", "cooking", "eating", "food", "meal", "tasty"] }, "shows": { "words": ["opera", "cinema", "movie", "concert", "show", "musical"] } }, "period": "last_2_days", "isolate_days": true, "isolate_words": true } Now the time has come to create the actual Lambda. Register Twitter App -------------------- | First of all you will have to register your own twitter API credentials. https://developer.twitter.com/ | Submitting the application takes 2-3 minutes, but after that you have to wait severals hours (or even days). You can submit the application now and add them to config later. The Lambda shall handle the missing credentials. Package Lambda Code ------------------- Creating the Lambda is very similar to the way we deployed ``sosw`` Essentials. We use the same scripts and deployment workflow. Feel free to use your own favourite method or contribute to upgrade this one. .. code-block:: bash # Get your AccountId from EC2 metadata. Assuming you run this on EC2. ACCOUNT=`curl http://169.254.169.254/latest/meta-data/identity-credentials/ec2/info/ | \ grep AccountId | awk -F "\"" '{print $4}'` # Set your bucket name BUCKETNAME=sosw-s3-$ACCOUNT FUNCTION="sosw_tutorial_pull_tweeter_hashtags" FUNCTIONDASHED=`echo $FUNCTION | sed s/_/-/g` cd /var/app/sosw/examples/workers/$FUNCTION # Install sosw package locally. It's only dependency is boto3, but we have it in Lambda # containter already. Saving a lot of packages size ignoring this dependency. # Install other possible requirements directly into package. pip3 install sosw --no-dependencies --target . pip3 install -r requirements.txt --target . # Make a source package. TODO is skip 'dist-info' and 'test' paths. zip -qr /tmp/$FUNCTION.zip * # Upload the file to S3, so that AWS Lambda will be able to easily take it from there. aws s3 cp /tmp/$FUNCTION.zip s3://$BUCKETNAME/sosw/packages/ # Package and Deploy CloudFormation stack for the Function. # It will create the Function and a custom IAM role for it with permissions to # acces the required DynamoDB tables. aws cloudformation package --template-file $FUNCTION.yaml \ --output-template-file /tmp/deployment-output.yaml --s3-bucket $BUCKETNAME aws cloudformation deploy --template-file /tmp/deployment-output.yaml \ --stack-name $FUNCTIONDASHED --capabilities CAPABILITY_NAMED_IAM This pattern has created the IAM Role for the function, the Lambda function itself and a DynamoDB table to save data to. All these resources are still falling under the AWS free tier if you do not abuse them. In case you will later make any changes to the application and need to re-deploy a new version, you may use the following script. It will validate changes in CloudFormation template and also publish the new version of the Lambda code package: .. hidden-code-block:: bash :label: Show script
# Get your AccountId from EC2 metadata. Assuming you run this on EC2. ACCOUNT=`curl http://169.254.169.254/latest/meta-data/identity-credentials/ec2/info/ | \ grep AccountId | awk -F "\"" '{print $4}'` # Set your bucket name BUCKETNAME=sosw-s3-$ACCOUNT FUNCTION="sosw_tutorial_pull_tweeter_hashtags" FUNCTIONDASHED=`echo $FUNCTION | sed s/_/-/g` cd /var/app/sosw/examples/workers/$FUNCTION # Make a source package. zip -qr /tmp/$FUNCTION.zip * # Upload the file to S3, so that AWS Lambda will be able to easily take it from there. aws s3 cp /tmp/$FUNCTION.zip s3://$BUCKETNAME/sosw/packages/ aws cloudformation package --template-file $FUNCTION.yaml \ --output-template-file /tmp/deployment-output.yaml --s3-bucket $BUCKETNAME aws cloudformation deploy --template-file /tmp/deployment-output.yaml \ --stack-name $FUNCTIONDASHED --capabilities CAPABILITY_NAMED_IAM aws lambda update-function-code --function-name $FUNCTION --s3-bucket $BUCKETNAME \ --s3-key sosw/packages/$FUNCTION.zip --publish Upload configs -------------- In order for this function to be managed by ``sosw``, we have to register in as a Labourer in the configs of sosw-Essentials. As you probably remember the configs are in the ``config`` DynamoDB table. Specially for this tutorial we have a nice script to inject configs. It finds the JSON files of the worker in ``FUNCTION/config`` and *"injects"* the ``labourer.json`` contents to the existing configs of Essentials. It will also create a config for the Worker Lambda itself out of the ``self.json``. You shall add twitter credentials in the placeholders there once you receive them and re-run the uploader. .. code-block:: bash cd /var/app/sosw/examples pipenv run python3 config_updater.py sosw_tutorial_pull_tweeter_hashtags After updating the configs we must reset the Essentials so that they read fresh configs from the DynamoDB. There is currently no special AWS API endpoint for this, so we just re-deploy the essentials. .. code-block:: bash # Get your AccountId from EC2 metadata. Assuming you run this on EC2. ACCOUNT=`curl http://169.254.169.254/latest/meta-data/identity-credentials/ec2/info/ | \ grep AccountId | awk -F "\"" '{print $4}'` # Set your bucket name BUCKETNAME=sosw-s3-$ACCOUNT for name in `ls /var/app/sosw/examples/essentials`; do echo "Deploying $name" FUNCTIONDASHED=`echo $name | sed s/_/-/g` cd /var/app/sosw/examples/essentials/$name zip -qr /tmp/$name.zip * aws lambda update-function-code --function-name $name --s3-bucket $BUCKETNAME \ --s3-key sosw/packages/$name.zip --publish done Schedule task ------------- | Congratulations! | You are ready to **run** the tutorial. You just to call the ``sosw_scheduler`` Lambda with the Job that we constructed at the very beginning. The payload for the Scheduler must have the ``labourer_id`` which is the name of Worker function and the optional ``job``. .. hidden-code-block:: json :label: See full payload
{ "lambda_name": "sosw_tutorial_pull_tweeter_hashtags", "job": { "topics": { "cars": { "isolate_words": true, "words": ["toyota", "mazda", "nissan", "racing", "automobile", "car"] }, "food": { "isolate_words": true, "words": ["recipe", "cooking", "eating", "food", "meal", "tasty"] }, "shows": { "isolate_words": true, "words": ["opera", "cinema", "movie", "concert", "show", "musical"] } }, "period": "last_2_days", "isolate_days": true, "isolate_words": true } } This JSON payload is also available in the file ``FUNCTION/config/task.json``. .. code-block:: bash cd /var/app/sosw/examples PAYLOAD=`cat workers/sosw_tutorial_pull_tweeter_hashtags/config/task.json` aws lambda invoke --function-name sosw_scheduler \ --payload "$PAYLOAD" /tmp/output.txt && cat /tmp/output.txt