Aws Big Data Engineer Resume
Atlanta, GA
SUMMARY
- Around 4 years of work experience in IT as a Big Data Engineer, AWS Data Engineer and Programmer Analyst.
- Proven expertise in deploying major software solutions for various high - end clients meeting the business requirements such as Big Data Processing, Ingestion, Analytics and Cloud Migration from On-prem to AWS Cloud.
- Expertise in deploying cloud-based services with Amazon Web Services (Databases, Migration, Compute, IAM, Storage, Analytics, Network & Content Delivery, Lambda and Application Integration).
- Worked on End to End Software Development Life Cycle process in Agile Environment using SCRUM methodologies.
- Hands on expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis)
- Good Hands on expertise with AWS storage services such as S3, EFS, Storage Gateways and partial familiarity with Snowball.
- Great Understanding of AWS compute services such as EC2, Elastic Map Reduce(EMR), EBS and accessing Instance metadata.
- In dept understanding of Monitoring/Auditing tools in AWS such as CloudWatch and Cloud Trail.
- Developed and Deployed various Lambda functions in AWS with in-built AWS Lambda Libraries and also deployed Lambda Functions in Scala with custom Libraries.
- Expertise understanding of AWS DNS Services through Route53. Understanding of Simple, Weighted, Latency, Failover & Geolocational Route types.
- Great Understanding of AWS Encryption types both in Transit(SSL/TLS) and Rest (SSE-S3, KMS, HSM, AES-256, SSE-Cinet&Server Side).
- Expertise understanding of AWS Network and Content Delivery services through Virtual Private Cloud(VPC).
- Hands on Expertise and Functionality knowledge of IPs, Access Control Lists, Subnets, NAT Instances & Gateways, VPC-Peering Custom VPCs and Bastions.
- Expertise with AWS Application Integrations services such as Simple Queue Service(SQS), Simple Notification Services(SNS), Simple Work Flow Service(SWF), SES, Elastic Transcoder, Kinesis (Streams, Firehouse & Analytics).
- Hands on experience on Data Analytics Services such as Athena, Glue Data Catalog & Quick Sight.
- Worked on ETL Migration services by developing and deploying AWS Lambda functions for generating a serverless data pipeline which can be written to Glue Catalog and can be queried from Athena.
- Writing CloudFormation Templates in JSON for Network and Content Delivery of the AWS Cloud Environment.
- Great understanding of Hadoop framework and Big Data Ecosystems. Proven expertise on Big Data Ingestion, Processing and Workflow designs.
- Worked on Sqoop for data Ingestion, Hive & Spark for Data Processing & Oozie for designing complex work flows in Hadoop frame work.
- Sqoop Ingestion from MuleSoft Integrated SAP Hana source. Sqoop Ingestion from HANA views to HDFS/S3.
- Kafka- Writing to topics using the Kerberos tickets. Generating producer’s sources and consuming the topics with code based read/writes.
- Worked on Oozie, Airflow & SWF for complex workflows. Familiarity with HUE.
- Worked on Scala code base related to Apache Spark performing the Actions, Transformations on RDDs, DataFrames & Datasets using SparkSQL and Spark Streaming Contexts.
- Access and ingest Hive tables to Apache Kudu for Data warehousing after performing the record joins using Kudu context in Apache Spark.
- Good understanding of Apache Spark High level architecture and performance tuning patterns.
- Great knowledge about Hive (architecture, Thrift servers), HQLs, Beeline and other 3rdparty JBDC connectivity services to Hive.
- Migrating Hive & MapReduce jobs to EMR and Qubole with Automating the workflows using Airflow.
- Easy of operability with the Unix file system through command line interface.
- Good expertise knowledge with the Unix commands like changing the permissions of the file to file and group permissions.
- Handling Server to Server Key encryptions and accessibility to various owners and groups in Unix environments.
- Addressing complex POCs according to business requirements from the technical end.
- Writing test cases for achieving the Unit test accomplishment.
- Key analysis of issues raised and estimating RCA on criticality of the issues raised and handling them.
- Effectively communicate with business units and stake holders and provide strategic solutions according to the client’s requirements.
- Always curious about upgrading technology by being a Cloud & Data Enthusiast.
TECHNICAL SKILLS
Big Data Skillset - Frameworks & Environments: Cloudera CDHs, Hortonworks HDPs, Hadoop1.0, Hadoop2.0, HDFS, MapReduce, Pig, Hive, Impala, HBase, Data Lake, Cassandra, MongoDB, Mahout, Sqoop, Oozie, Zookeeper, Flume, Splunk, Spark, Storm, Kafka, YARN, Falcon, Avro.
Amazon Web Services(AWS): Elastic Map Reduce, EC2 Instances, Airflow, Amazon S3, Amazon Redshift, Dynamo DB, Elastic Cache, Storage Gateways, DNS Route53, Encryption, Virtual Private Cloud, SQS, SNS, SWF, Athena, Glue, Cloud Watch Logs, IAM Roles, Ganglia, EMRFS, s3cmd(Batches), Ruby EMR Utility(monitoring), Boto, Amazon Cognito, AWS API Gateway, AWS Lambda, Kinesis (streams, Firehouse & Analytics).
JAVA & J2EE Technologies: Core Java (Java8), Hibernate framework, Spring framework, JSP, Servlets, Java Beans, JDBC, Json, Java Sockets & Java Scripts. JavaScript, jQuery, XML, … Servlets, HTML, CSS, SOAP.
Messaging Services: JMS, MQ Series, MDB, J2EE MVC Frameworks Struts … Struts 2.1, Spring 3.2, MVC, Spring Web Flow.
IDE Tools: Eclipse, Net Beans, Spring Tool Suite, Hue (Cloudera specific).
Databases & Application Servers: Oracle, MySQL, DB2, Cassandra, Hbase, MangoDB, Database Technologies MySQL, Oracle 8i, 9i, 11i & 10g, MS Access, Microsoft SQL-Server 2000 and DB2 8.x/9.x, PostgreSQL.
Other Tools: Putty, WinSCP, FileZilla, DataLake, Talend, Tableau, GitHub, SVN, CVS.
PROFESSIONAL EXPERIENCE
AWS Big Data Engineer
Confidential, Atlanta, GA
Responsibilities:
- Worked on Cloudera CDH installation on EC2s ensuring the setup for running MapReduce jobs using Hive, Hue & Spark.
- Worked on ETL jobs through Spark with SQL, Hive, Streaming & Kudu Contexts.
- Converted ETL pipelines to Scala code base and performed data accessibility to & from S3.
- Performed Sqoop ingestion through Oozie workflows from MSSQL server and SAP HANA Views.
- Implemented Slowly changings Dimensions(SLDs) in S3 while populating the data to S3.
- Performed record joins using Hive and Spark using the Data sets and pushed the Tables to Apache Kudu.
- Performed PostgreSQL DDL parsing to be Amazon Redshift Compatible form in building the data ware housing.
- Worked on raw data migration to Amazon cloud into S3 and performed refined data processing.
- Written the cloud formation template in JSON format for leveraging the content delivery with Cross Region Replication using the Amazon Virtual Private Cloud.
- Implemented the Columnar Data Storage, Advanced Compression and Massive Parallel Processing using the Multinode Redshift feature.
- Code contribution to the next generation DataLake Accelerator which leverages the Scala Spark APIs for processing records based on the Datasets and Schema files provided as parameters.
- Extensive expertise using the core Spark APIs and processing data on a EMR cluster.
- Created the Log4j properties file and generated the exception matrix for implementing the logging for the DataLake Accelerator.
- Reviewed the complex ETL pipelines on ruby based applications and provided insights for HLDs for migrating to Scala shop based ETL pipelines.
- Written the AWS Lambda functions in Scala with cross functionality dependencies which would generate custom libraries for deploying the Lambda function in the Cloud.
- Performed Raw data ingestion into S3 from kinesis firehouse which would trigger a lambda function and pit refined data into another S3 bucket and write to SQS queue as aurora topics.
- Writing to Glue metadata catalog which in turn enables us to query the refined data from Athena achieving a serverless querying environment.
- Created Quick sight reports for customer based deliverable request.
- Performed the migration of Hive and MapReduce Jobs from on-premise MapR to AWS cloud using EMR and Qubole.
- Enhanced the MapReduce jobs performance of the Hive jobs.
- Automated the complex workflows using the Airflow workflow handler.
- Implemented Encryption using the Amazon KMS and Client and Server-side SSE features.
- Leveraging build tools such as SBT and Maven for Building the Spark Applications.
- Performed API calls using the python scripting. Performed reads and writes to S3 using Botto3 libraries.
- Processing JSON data by performing the API calls and landing the Data into S3 for refining the Data.
- Good familiarity with Scala/Python and Java languages for working in the code-base for applications.
Environment: Amazon Web Services, Elastic Map Reduce cluster, EC2s, CloudFormation, Oozie, Airflow, Amazon S3, Amazon Redshift, Dynamo DB, Cloud Watch, IAM Roles, EMRFS, s3cmd(Batches), Ruby EMR Utility(monitoring), Boto, Amazon Cognito, AWS API Gateway, Glue, Athena, SAP HANA Views, Apache Kudu, HUE, Kerberos, AWS Lambda, Kinesis(streams), Hive, Scala, Python, HBase, Sqoop, SBT, Maven, Amazon RDS(Aurora), Apache Spark, Spark SQL, Shell Scripting, Tableau, Cloudera.
Big Data Cloud Engineer
Confidential, Milwaukee, WI
Responsibilities:
- Implemented Hadoop jobs on a EMR cluster performing several Spark, Hive & MapReduce Jobs for processing data for building recommendation Engines, Transactional fraud analytics and Behavioral insights.
- Team player for the DataLake production support. The DataLake typically supports over 750million searches/day, 9Billion pricing inventory updates/day, 14 Trillion automated transactions/year generating around 1.2TB of Data daily.
- Populating the DataLake is done by leveraging Amazon S3 services interactions made possible through Amazon Cognito/Boto/s3cmd.
- Parsing the data from S3 through the Python API calls through the Amazon API Gateway generating Batch Source for processing.
- Scheduling Batch jobs through Amazon Batch performing Data processing jobs by leveraging Apache Spark APIs through Scala.
- Performed Spark jobs with the Spark core, SparkSQL libraries for processing the data.
- Great familiarity with Spark RDDs, Spark Actions & Transformations, Spark DataFrames, SparkSQL Spark File formats.
- Expertise knowledge about Hadoop & Spark High Level Architectures.
- Hands-on experience in implementing and deploying (Elastic Map Reduce) EMR cluster leveraging Amazon Web Services with EC2 instances.
- Good familiarity with AWS services like Dynamo DB, Redshift, Simple Storage Service (S3), Amazon Elastic Search Services.
- Great knowledge about EMRFS, S3 bucketing, m3. xlarge, c3.4xlarge, IAM Roles & Cloud watch logs.
- Deployment mode of the cluster was achieved through YARN scheduler and the size is Auto scalable.
- Used Amazon Airflow for complex workflow automation. The process automation is done by wrapper scripts through shell scripting.
- Great familiarity with Linux environment and user groups.
- Worked with JSON, CSV, Sequential and Text file formats.
- Importing data from DynamoDB to Redshift in Batches using Amazon Batch using TWS scheduler.
- Used Flask and Gangila for monitoring Spark jobs.
- Involved in Data Validation and fixing discrepancies by working in coordination with the Data Integration and Infra Teams.
- Situational awareness of the staging environment, active JIRAs and production support issues.
- Build tool for leveraging the dependencies was simple build tool (SBT). worked with data owners, Business Units, Data Integration team and customers in fast paced Agile/Scrum environment.
- Familiarity with reporting and BI tools used in the reporting & visualization processes.
- Active involvement in Business meetings and team meetings.
Environment: Amazon Web Services, Elastic Map Reduce (EMR 4.1.0) cluster, EC2 Instances, m3. xlarge(master), c3.4xlarge(slaves), Airflow, Amazon S3, Amazon Redshift, Dynamo DB, Cloud Watch Logs, IAM Roles, Ganglia, EMRFS, s3cmd(Batches), Ruby EMR Utility(monitoring), Boto3, Amazon Cognito, AWS API Gateway, AWS Lambda, Kinesis(streams), Hive, Scala, Python, HBase, Sqoop, SBT, Apache Spark, Spark SQL, Shell Scripting, Tableau.
Big Data Engineer
Confidential, Phoenix, AZ
Responsibilities:
- Implemented several scheduled Spark, Hive & Map Reduce jobs in Hadoop MapR distribution.
- Performed Data Ingestion using Sqoop, Used Hive QL & Spark SQL for data processing and scheduled the complex work flows using Oozie.
- Deployed several process oriented scheduled jobs through cron tabs and event engines using wrapper scripts for invoking the Spark module.
- Developed various main & service classes through Scala and used spark SQLs for querying data specific tasks.
- Performed various data validation jobs in the backend through hive and Hbase.
- Populating Hbase tables and querying Hbase using hive shell.
- Good understanding of spark core executors and partitions and overall high-level architecture of Apache Spark.
- Great familiarity with Hive joins & used HQL for querying the databases eventually leading to complex Hive UDFs.
- Expertise knowledge of handling Unix environment like changing the permissions of the files and groups. Great ability to work through the command line interface.
- Involved in Data Validation and fixing discrepancies by working in coordination with the Data Integration and Infra Teams.
- Assisted Data architects while making the decisions regarding the technical stack for applicability.
- Resolving data level issues raised against the functionality alongside data integration team.
- Performing RCA on criticality of module and also handing error feeds data in the functionality.
- Performed Unit Testing & Integration with sample test cases and assisted QA Team and addressed several performance issues according to the Business Unit requirements.
- Proven expertise in handling the exception scenarios while handling the errored feed data in coordination with the Data Architects, Data Integration team, Business Partners and Stakeholders.
- Indulged in regular stand-up meetings, status calls, Business owner meetings with stake holders, Risk management Teams in an Agile Environment.
- Great understanding of the high-level architecture of the business logic for decomposing the complexity of module to simple achievable tasks for efficient development.
Environment: MapR Hadoop Distribution, M3&M5, Hive, Scala, HBase, Sqoop, Maven builds, Spark, Spark SQL, Oozie, Linux/Unix, Shell Scripting, UC4 Complex workflow, SVN, Talend, Kafka.
Programmer Analyst
Confidential
Responsibilities:
- Implemented several User Stories using core java, HTML, CSS.
- Solid understanding of Object Oriented Design and analysis with extensive experience in the full life cycle of the software design process including requirement definition, prototyping, Proof of Concept, Design, Implementation and Testing.
- Experience on backend using collections, Structs, maps, DHTML, JavaScript, IDE&Tool Eclipse, Notepad++.
- Involved in the Software Development Life Cycle phases like Requirement Analysis, Implementation and estimating the time-lines for the project.
- Assisted in resolving data level issues dealing with input & output streams.
- Developed custom directives (elements, Attributes and classes).
- Developed single page applications using angular.js
- Extensively involved in redesigning the entire site with CSS styles for consistent look and feel across all browsers and all pages.
- Used Angular MVC and two-way data binding.
- Created Images, Logos and Icons that are used across the web pages using Adobe Flash and Photoshop.
- Development of the interactive UI's for the front-end users using the front end technologies like HTML, CSS, JavaScript and jQuery.
- Designed dynamic client-side JavaScript, codes to build web forms and simulate process for web application, page navigation and form validation
Environment: Core Java, Java Script, UI/UX, Linux, Shell Scripting, Web Browsers, Instrumentation, Oracle SQL Server, SQL queries, Relational Data bases.