Sr. Aws Cloud Data Engineer Resume
Quincy, MA
SUMMARY
- Effective professional experience of 9 years in the field of Information Technology focused on Amazon Web Services, Azure, DevOps and Linux Administratorthat includes the principles with Continuous Integration, Continuous Delivery and Continuous Deployment. strong experience in Data Analyst, Data mining with large data sets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Statistical modeling, Data modeling, Data Visualization, Web Crawling, Web Scraping. Adept in statistical programming languages like R and Python, SAS, Apache Spark, Matlab including Big Data technologies like Hadoop, Hive, Pig.
- In - depth knowledge of DevOps management methodologies and production deployment which include Compiling, Packaging, Deploying and Application Configurations.
- Worked with customer support team to implement projects likeData warehouse, Data Egineering, Data integration automation, process design, API enablement, Analytics, Data quality etc.
- Well versed with Big data on AWS cloud services i.e. EC2, S3, Glue, DynamoDB and RedShift
- Experienced in AWS Cloud Computing services, such as EC2, S3, Lambda, API, Dynamo, EBS, VPC, ELB, Route53, Cloud Watch, Security Groups, Cloud Trail, IAM, Cloud Front, Snowball, EMR, RDS and Glacier also worked on DNS, SSL and Firewalls.
- Worked with IAM service creating new IAM users & groups, defining roles and policies and Identity providers.
- Created alarms and trigger points in Cloud Watch based on thresholds and monitored the server's performance, CPU Utilization, disk usage.
- Experience in analyzing data using Python, R, SQL, Microsoft Excel, Hive, PySpark, and Spark SQL for Data Mining, Data Cleansing, Data Munging and Machine Learning.
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle/Snowflake.
- Experienced in Branching, Merging, Tagging and maintaining the version across the environments using SCM tools like Git and Subversion (SVN) on Linux platforms.
- Installing, configuring and administering Jenkins Continuous Integration tool on Linux machines along with adding/updating plugins such as SVN, Maven, and ANT.
- Proficient with Shell, Python, Ruby, Perl, Power Shell, JSON, YAML, Groovy scripting languages.
- Experience in software build tools like Apache Maven, Apache Ant to write Pom.xml and Build.xml respectively.
- Experience in AZURE, Migration of all servers from on-premises to Kubernetes containers & writing the scripts in Perl and Shell Scripts for managing various enterprise applications.
- Experience in the integration of various data sources such as Oracle, SQL Server, Salesforce cloud, Teradata, JSON, XML Files, Flat files and API integration
- Knowledge on Puppet as Configuration management tool, to automate repetitive tasks, quickly deployed critical applications on different nodes and proactively managed change.
- Experience in configuring and managing Chef Master Server and experience in updating and creating modules and pushing them to Chef Clients.
- Worked with Ansible On-premise like writing the script in workstation and pushing that on to the server.
- Good understanding of the principles and best practices of Software Configuration Management (SCM) in agile (scrum) and Waterfall methodologies.
- Wrote AWS Lambda functions in python for AWS's Lambda which invokes python scripts to perform various transformations and analytics on large data sets in EMR clusters.
- Experienced with Nagios, Cloud Watch as IT Infrastructure Monitoring tool and knowledge on Splunk.
- Experienced with Databases Like Cassandra, MongoDB, MySQL and Oracle SQL
- Broad experience in Bash, Perl, Python scripting on Linux. Strong knowledge on Linux internals
- Experienced in Installing, Configuring and Managing Docker Containers, Docker Images for Web Servers andApplications servers such as Apache, Tomcat using Docker and integrated with Amazon MySQL RDS database.
- Worked on several prototype projects involving clustered container orchestration and management. Contributed a MySQL cluster application to the Kubernetes project.
- Familiarity with Azure Cloud Solutions and architectures (Windows/Linux VM’s, Data Lake, HDInsight, SQL Database, Virtual Network, Azure AD)
- Good Interpersonal Skills, team-working attitude, takes initiatives and very proactive in solving problems and providing best solutions.
TECHNICAL SKILLS
Operating systems: Linux (Red Hat 4/5/6/7, CENTOS & SUSE), Red Hat Linux 4/5/6/7, Windows servers 2003, 2008, 2008 R2, 2012, 2012R2, Windows 2000, XP, Windows 7, Ubuntu 12/13/14, Solaris 11/10/9/8, HP-UX 11.0, 11.11, 11.23, 11.31
Python Libraries/Packages: Amazon Web Services (AWS): EC2, S3, ELB, EMR, Auto scaling, Elastic Beanstalk, Cloud Front, Cloud Formation, Elastic Filesystem, RDS, DMS, VPC, Direct Connect, Route 53, Cloud watch, Cloud trail, IAM, SNS, Google Cloud, OpenStack. NumPy, SciPy, Boto, Pickle, PySide, PyTables, Data Frames, Pandas, Matplotlib, SQL Alchemy, HTTPLib2, Urllib2, Beautiful Soup, Py Query
Application Servers: Web Logic Application Server 9.x, 10.x, Apache Tomcat 2.0.x, JBOSS 4.x/5.x, Red Hat
Automation tools: Puppet, Chef, Docker, Ansible, Jenkins, Kickstart, Jumpstart, Terraform, Kubernetes
Virtualization: VMware Client, Windows Hyper-V, vSphere 5.x, Datacentre Virtualization, Virtual Box, KVM, Power VM
Volume Manager: Logical Volume Manager, VERITAS Volume Manager, Solaris Volume Manager
Backup Management: Veritas NetBackup, Symantec NetBackup, EMC- Replication Manager
Networking Protocol: TCP/IP, NIS, NFS, DNS, DHCP, SMTP, FTP/SFTP, HTTP/HTTPS, NDS, Cisco Routers/Switches, WAN, LAN
Monitoring tools: Splunk, Nagios, ELK, App dynamic, Cacti
Scripting: Perl, Python, Ruby, Bourne, Korn and Bash Shell scripting, PowerShell, YAML format, JSON format.
Storage: EMC Clarion CX series, NetApp.
Data base technologies: Oracle, SQL Server, MySQL, NoSQL, MongoDB, Cassandra, DynamoDB, Couchbase.
Version control tool: Git, SVN, Bitbucket, CVS.
PROFESSIONAL EXPERIENCE
Sr. AWS Cloud Data Engineer
Confidential, Quincy, MA
Responsibilities:
- Developed Data pipelines using python for medical image pre-processing, Training and Testing.
- Develop a data platform from scratch and took part in requirement gathering and analysis phase of the project in documenting the business requirements.
- Worked in designing tables in Hive, MYSQL using SQOOP and processing data like importing and exporting of databases to the HDFS, involved in processing large datasets of different forms including structured, semi-structured and unstructured data
- Designed and implemented data loading and aggregation frameworks and jobs that will be able to handle hundreds of GBs of JSon files, usingSpark,AirflowandSnowflake.
- Design and engineer REST APIs and/or packages that abstract feature extraction and complex prediction/forecasting algorithms on time series data.
- Implemented a 'serverless' architecture using API Gateway, Lambda, and Dynamo DB and deployed AWS Lambda code from Amazon S3 buckets. Created a Lambda
- Good Understanding of other AWS services like S3, EC2 IAM, RDS Experience with Orchestration and Data Pipeline like AWS Step functions/Data Pipeline.
- Was responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
- Created scripts to schedule jobs, process data on Snowflake, load the data from Snowflake to Data bricks, generate excel report, and send attachments in email notification by using the template.
- DesignedNetwork Security Groups(NSGs) to control inbound and outbound access to network interfaces (NICs), VMs and subnets.
- Configured JIRA workflows according to the needs of the team and integrated the project management features of JIRA with the build and release process.
- Hands-on experience on working with AWS services like Lambda function, Athena, DynamoDB, Step functions, SNS, SQS, S3, IAM etc.
- Creating different AWS Lambda functions and API Gateways, to submit data via API Gateway that is accessible via Lambda function.
- Developed Restful API’s using Python for customer care system which can be used to easily access customer and product data
- Used AWS Glue for the data transformation, validate and data cleansing.
- Used AWS data pipeline for Data Extraction, Transformation and Loading from homogeneous or heterogeneous data sources and built various graphs for business decision-making using Python matplot library
- Developing data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations.
- Added support for Amazon AWS S3 and RDS to host static/media files and the database into Amazon Cloud.
- Exposure to Database/Data Lake & Warehouse, SQL (Oracle, Teradata, Greenplum, Postgres etc.), and ETL (Talend, Informatica
- Worked in development of applications especially in LINUX environment and familiar with all its commands and worked on Jenkins continuous integration tool for deployment of project and deployed the project into Jenkins using GIT version control system
- Worked on importing and exporting data from snowflake, Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
- Managed the imported data from different data sources, performed transformation using Hive, Pig and Map- Reduce and loaded data in HDFS.
- Installing all Linux flavors OS, Configuring and binding system with domain.
- Automate Datadog Dashboards with the stack through Terraform Scripts.
- Write Terraform scripts for CloudWatch Alerts.
- Exposure to Database/Data Lake & Warehouse, SQL (Oracle, Teradata, Greenplum, Postgres etc.), and ETL (Talend, Informatica)
- Secured sensitive data like DB passwords and Bitbucket passwords with Ansible Vault.
- Managed the container orchestration using Openshift which is based on the Docker and Kubernetes system.
- Integrated Docker container orchestration framework using Kubernetes by creating pods, config Maps and deployments.
- Create and maintain highly scalable andfault tolerantmulti-tier AWS and Azure environments spanning across multiple availability zones usingTerraformandCloudFormation.
- Worked as shadow with project leads on Dot Net applications build and deployments using MS build.
- Responsible for Reviewing, auditing and evaluating Cloud security solutions and designs
Environment: Python, AWS, Lambda, Hadoop, Hive, Sqoop, Pig, java, Django, Pyspark, Flask, XML, MySQL, MS SQL Server, Linux, Shell Scripting, Snowflake, Mongo dB, SQL, Django, HTML5/CSS, Cassandra, JavaScript, PyCharm, GIT, Linux, Shell Scripting, Glue, Step Functions, RESTful, Docker, Jenkins, JIRA, jQuery, MySQL, Bootstrap, HTML5, CSS, EC2, S3.
Sr. Cloud Data/Python Developer
Confidential, Juno Beach, FL
Responsibilities:
- Developed rest API's using python with flask framework and done the integration of various data sources including Java, JDBC, RDBMS, Shell Scripting, Spreadsheets, and Text files.
- Implemented Agile Methodology for building an internal application.
- Developed A.I machine learning algorithms like Classification, Regression, and Deep Learning using python.
- Cleansing the data for normal distribution by applying various techniques like missing value treatment, outlier treatment, and hypothesis testing.
- Perform Data Cleaning, features scaling, features engineering using pandas and numpy packages in python.
- Working on AWS cloud to provision new instances, S3 Storage Services, EC2 & Cloud watch services, CI/Pipeline management through Jenkins
- Created a mechanism to import third party vendor orders and distributor information data using API endpoint extraction
- Expertise in snowflake to create and Maintain Tables and views.
- Configured AWS IAM and Security Groups in Public and Private Subnets in VPC.Managed network security using Load balancer, Auto-scaling, Security groups and NACL.
- Utilized AWS CLI to automate backups of ephemeral data-stores to S3 buckets, EBS and create nightly AMIs for mission critical production servers as backups.
- Used Ansible for configuration management of hosted Instances within AWS. Configuring and Networking of Virtual Private Cloud (VPC).
- Experienced with event-driven and scheduled AWS Lambda functions to trigger various AWS resources.
- Integrated lambda with SQS and DynamoDB with step functions to iterate through list of messages and updated the status into DynamoDB table.
- Used AWS glue catalog with crawler to get the data from S3 and perform sql query operations
- Automated the infrastructure in Google Cloud by using Deployment Manager Templates for various services in GCP.
- Handled installation, administration and configuration of ELK stack on AWS and performed Log Analysis.
- Implemented and maintained the monitoring and alerting of production and corporate servers/storage using Cloud Watch.
- Responsible for build and deployment automation using VM Ware ESX, Docker, Kubernetes containers and Chef.
- Focused on containerization and immutable infrastructure. Docker has been core to this experience, along with Kubernetes. Experienced in using Docker Swarm and deployed spring boot applications.
- Used Docker and Openshift to manage Microservices for development and testing. Used Openshift platform to build the PAAS applications.
- Integrated lambda with SQS and DynamoDB with step functions to iterate through list of messages and updated the status into DynamoDB table.
- Used python Boto 3 to configure the services AWS glue, EC2, S3
- Managing GitLab and Bit Bucket account for providing access to the Developers and storing the source code.
- Couchbase and Redis caching clusters and Atlassian-Jira/Confluence ticketing and collaborative applications.
- Maintained build related scripts developed in shell for Maven builds. Created and modified build configuration files including POM.xml.
- Scripting of administration tasks using CLI, Power Shell, Shell and Ruby. Built upstream and downstream jobs in Jenkins to build and deploy onto different environments.
- Built and engineered servers on Ubuntu and RHEL Linux. Provisioned virtual servers on VMware and ESX servers using Virtual Cloud.
Environment: Python, Map Reduce, Hive, HDFS, PIG, Sqoop, Hortonworks, Flume, HBase, Oracle, Snowflake, Teradata, Tableau, Lambda, Step Functions, Glue, Unix/Linux, Hadoop, Hive, PIG, Hadoop, SQOOP, Flume, HDFS, Oracle/SQL & DB2, Unix/Linux, JIRA, AWS
Senior Cloud Data Engineer
Confidential, Milford, CT
Responsibilities:
- Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large scale data handling Millions of records every day.
- Worked in Server infrastructure development on AWS Cloud, extensive usage of Virtual Private Cloud (VPC), Cloud Formation, Lambda, Cloud Front, Cloud Watch, IAM, EBS, Security Group, Auto Scaling, Dynamo DB, Route53, and Cloud Trail.
- Worked on Big data on AWS cloud services i.e. EC2, S3, EMR and DynamoDB
- Setup Azure Virtual Appliances (VMs) to meet security requirements as software based appliance functions (firewall, WAN optimization and intrusion detections).
- Managed security groups on AWS, focusing on high-availability, fault-tolerance, and auto scaling using Terraform templates. Along with Continuous Integration and Continuous Deployment with AWS Lambda and AWS code pipeline.
- Developed SSRS reports, SSIS packages to Extract, Transform and Load data from various source systems
- Implementing and Managing ETL solutions and automating operational processes.
- Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.
- Create several types of data visualizations using Python and Tableau.
- Performed preliminary data analysis using descriptive statistics and handled anomalies such as removing duplicates and imputing missing values.
- Developed Python scripts to take backup of EBS volumes using AWS Lambda and Cloud Watch.
- Used Pandas, Opencv, Numpy, Seaborn, Tensorflow, Keras, Matplotlib, Sci-kit-learn, NLTK in Python for developing data pipelines and various machine learning algorithms.
- Defined facts, dimensions and designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
- Managed Build results in Jenkins and deployed using workflows.
- Maintained and tracked inventory using Jenkins and set alerts when the servers are full and need attention.
- Created Entity Relationship Diagrams (ERD), Functional diagrams, Data flow diagrams and enforced referential integrity constraints and created logical and physical models using Erwin.
- Created ad hoc queries and reports to support business decisions SQL Server Reporting Services (SSRS).
- Analyze the existing application programs and tune SQL queries using execution plan, query analyzer, SQL Profiler and database engine tuning advisor to enhance performance.
- Involved in the Forward Engineering of the logical models to generate the physical model using Erwin and generate Data Models using ERwin and subsequent deployment to Enterprise Data Warehouse.
- Wrote various data normalization jobs for new data ingested into Redshift.
- Created various complex SSIS/ETL packages to Extract, Transform and Load data
- Advanced knowledge on Confidential Redshift and MPP database concepts.
- Migrated on premise database structure to Confidential Redshift data warehouse
- Was responsible for ETL and data validation using SQL Server Integration Services.
- Implemented Work Load Management (WML) in Redshift to prioritize basic dashboard queries over more complex longer-running adhoc queries.
- This allowed for a more reliable and faster reporting interface, giving sub-second query response for basic queries.
- Worked publishing interactive data visualizations dashboards, reports /workbooks on Tableau and SAS Visual Analytics.
- Used Hive SQL, Presto SQL and Spark SQL for ETL jobs and using the right technology for the job to get done.
Environment: AWS, Azure, Python, Lambda, Jenkins, Redshift, Terraform, Java, ETL, SQL Server, Erwin, Oracle, Informatica, RDS, NOSQL, MySQL, PostgreSQL
Python Developer/Data Engineer
Confidential, Ashburn, VA
Responsibilities:
- Maintained and Administered GIT Source Code Tool.
- Created and maintained Subversion/GIT repositories, branches, tags and performed merges in stash and GIT.
- Managed version control tool Git to version code changes to help developers/programmers branch/merge/revert code.
- Experience with software development methodologies such as Waterfall and Agile (Scrum).
- Migrating Physical servers to VM's using VMware P2V converter in JBOSS web environments.
- Developed and maintained Perl/Shell scripts forbuild and releasetasks.
- Extensively used ANT tool to do thebuilds, integrated ANT to Eclipse and did localbuilds.
- Created and maintained the Shell/Perl deployment scripts for Web Logic web application servers.
- Initially used Ant for writingbuild.xml for building Java/J2ee applications later on migrated to Maven.
- BuildJava code and .NET code on to different Jenkins servers as per the schedule.
- Used Puppet server and workstation to manage and configure nodes, experienced in writing puppet manifests to automate configuration of a board range of services.
- Designed Terraform template for Launch the EC2 instance with IAM, VPC, Subnet, Security Groups, Route Table and Internet Gateway.
- Worked in designing and implementing continuous integration system using Jenkins by creating Python and Shell scripts.
- Developed auto container to automate containerization of new and existing applications as well as deployment and management of complex run time environment like Kubernetes.
- Building Puppet enterprise modules using puppet DSL to automate infrastructure provisioning and configuration management to existing infrastructure by deploying Puppet, Puppet Dashboard and Puppet Data base (DB).
- Defined AWS Security Groups which acted as virtual firewalls that controlled the traffic allowed reaching one or more AWSEC2 instances.
- Written Chef Cookbooks for various DB configurations to modularize and optimize end product configuration, converting production support scripts to Chef Recipes and AWS server provisioning using Chef Recipes.
- Integrated Docker container based test infrastructure to Jenkins CI test flow and set up build environment integrating with Git and Jira to trigger builds using Web Hooks and Slave Machines.
- Worked with logging/monitoring tools such as Splunk, Log stash, Cloud Watch and Nagios.
- Used Jira to track issues and Change Management.
- Extensive knowledge in JIRA and knowledge on other CI tools like Bamboo.
DevOps Engineer
Confidential
Responsibilities:
- Worked with QA to facilitate verification of releases and was involved in running multiplebuildsat a time
- Created and deployedbuildson Web logic, tested inbuildenvironment and released to test team on scheduled time
- Involved in installing, updating and configuring UNIX and Windows buildrelease of Cloud products in Linux and Windows environments, using Power Shell, TFS and Python Scripting
- Created GIT configuration record forbuilds, using derived objects generated duringbuild audit process implemented and maintained the branching andbuild/releasestrategies utilizing GIT
- Used Kubernetes and Docker for the runtime environment for the CI/CD system to build, test and deploy
- Wrote chef recipes for various applications and deploying them in AWSusing Terraform.
- Created and maintained built wrapper scripts using PERL.BuildJava code and .NET code to Jenkins Servers.
- Presented on View Object pattern in Web App Automation C#, Ruby, Rake, Power Shell, Selenium, Team City.
- Responsible for installing Jenkins master and slave nodes and also involved in plugin Git and schedule jobs using Poll SCM option and also Creating the build scripts using Maven for Java projects.
- Merged release branches to the trunk after the production release and resolved the conflicts if any during the merge in Subversion.
- Developed Rational Clear Quest schemas and tailored the tools with custom Perl and VB scripts.