AWS Spark Developer Resume New Jersey - Hire IT People

SUMMARY

7 years of experience in IT industry as a software engineer which includes 3 years of experience in design and development using hadoop big data eco system tools.
Experience in using different hadoop eco system components such as HDFS, YARN, MapReduce, Spark, Pig, Sqoop, Hive, Impala, Hbase, Kafka, and Crontab tools.
Experience in developing ETL applications on large volumes of data using different tools: MapReduce, PySpark, Spark - Sql, Hive and Pig.
Well versed with Big data on AWS cloud services i.e. EC2, S3, EMR, Lambda and CloudWatch
Well-versed in using Map Reduce programming model for analyzing the data stored in HDFS and experience in writing Map Reduce codes in Java as per business requirements.
Experience in importing and exporting data using Sqoop from RDBMS to HDFS and Hive.
Expert in creating Spark UDFs using python in order to analyze data sets for complex aggregate requirements.
Experience in developing spark jobs to run entity resolution on PII data of United States population.
Responsible for taking care and running the spark jobs along with optimizing the jobs in spark along with data validation and automation.
Experience in performing extensive data analysis in spark using complex dataframe transformations and RDD transformations and creating external hive tables on top of the aggregated data for down stream consumption.
Used Shell scripting for automating the triggering of the spark jobs.
Used Hbase for real time low latency read writes for multiple applications.
Well versed in developing the complex SQL queries using Hive and Spark Sql.
Experienced in preparing and executing unit test plan and unit test cases during software development.
Strong understanding in Object-Oriented Programming concepts and implementation.
Experience in providing training and guidance to new team members in the Project.
Experience in detailed system design using use case analysis, functional analysis, modeling program with class & sequence, activity and state diagrams using UML and rational rose.
Used JIRA tool for creating the tasks and logging the work and tracking the issues.
Very good experience in customer specification study, requirements gathering, system architectural design and turning the requirements into final product/ service.
Experience in interacting with customers and working at client locations for real time field testing of products and services.
Ability to communicate and work effectively with associates at all levels within the organization.
Strong background in mathematics and have very good analytical and problem solving skills.

TECHNICAL SKILLS

Programming: Python, Core Java and SQL.

Big Data Eco System: HDFS, YARN, Map Reduce, Spark Core, SparkSQL, ImpalaHive, Pig, Kafka and Sqoop

AWS Stack: EC2, S3, EMR, Lambda and CloudWatch

ELK Stack: Elastic Search, Logstash and Kibana

Scripting Languages: UNIX Shell scripting and Python scripting.

DBMS / RDBMS: Oracle 11g, SQL Serve

Version Control: Git and Bitbucket

CI/CD Tool: Jenkins

PROFESSIONAL EXPERIENCE

Confidential, New Jersey

AWS Spark Developer

Responsibilities:

Develop software to combine data from legacy databases and files into various data marts using Pyspark.
Worked extensively on AWS Components such as Elastic Map Reduce (EMR), DynamoDB, Lambda, RDS etc.
Worked with Parquet files and converted the data from either format Parsed Semi Structured JSON data and converted to Parquet using Data Frames in PySpark.
Developed a Python Script to load the CSV files into the S3 buckets and created AWS S3buckets, performed folder management in each bucket, managed logs and objects within each bucket.
Deployed Scalable Hadoop cluster on AWS using S3 as underlying file system for Hadoop.
Connecting my SQL database through Spark driver.
Having experienced in Agile Methodologies, Scrum stories and sprints experience in a Python based environment, along with data analytics, data wrangling and Excel data extracts.
Develop and run serverless Spark based applications using AWS Lambda service and Pyspark to compute metrics for various business requirements.
Develop python, shell scripting and spark-based applications using Pycharm and Anaconda integrated development environments.
Use Git as version control tool for maintaining software and Jenkins as continuous integration and continuous development tool for deploying applications in production servers.
Push AWS EMR and AWS Lambda logs to Elastic search using log stash for log analysis.
Create different dashboards using documents in elastic search and Kibana as a visualization tool.
Analyze failed jobs in AWS EMR Spark cluster and identify the cause of failures and improve the job performance and minimize/avoid failures using pyspark software.
Use JIRA tool for updating my tasks/stories created by agile lead.
Analyze, store and process data captured from different sources using different AWS cloud services such as AWS S3, Cloud Formation, Lambda and EMR services.
Automate jobs using shell scripting and schedule those jobs to run at a specific time using crontab.
Test and validate the developed applications in development and QA environments and deploy them in Production environment.
Develop unit test cases for the software developed before deploying it in production servers.
Using version control tool - Git with Jenkins to accumulate all the work done by team members.
Using agile methodology - SCRUM, along with JIRA for project.
Responsible for debugging and troubleshooting the running applications in production.
Participated in writing scripts for test automation.

Environment: Spark Core, AWS S3, EMR, Lambda, CloudFormation, CloudWatch, Python, Hive, Presto, Crontab, Elastic Search and Kibana.

Confidential, Texas

Hadoop Developer

Responsibilities:

Extensively used Spark core i.e. RDDs, DataFrames, and Spark Sql as part of developing multiple applications using both Python and Scala.
Built multiple data pipe lines using Pig scripts for processing data for specific applications.
Used different file formats such as Parquet, Avro, and ORC for storing and retrieving data in hadoop.
Used Spark-streaming for consuming event based data from Kafka and joined this data set with existing Hive table data to generate performance indicators for an application
Developed analytical queries on different tables using Spark sql for finding insights and building data pipelines for data scientists to consume this data for applying ML models.
Spark performance tuning by applying different techniques: choosing optimum parallelism, Serialization format while shuffling the data, using broadcast variables, joins, aggregations, and memory management.
Written multiple custom Sqoop import scripts to load data from oracle into HDFS directories and Hive tables.
Used Nifi for automating and managing data flows between multiple systems.
Used different compression techniques while storing data into Hive tables for performance improvement: snappy and Gzip
Have used Impala for faster querying for a time critical application to generate reports.
Also used Hbase for OLTP purpose for an application requiring high scalability using hadoop.
Have written sqoop export scripts to write the date from HDFS into Oracle database.
Used Control M component to simplify and automate different batch workload Applications.
Worked closely with multiple data science and machine learning teams in building a data eco system to support AI.
Also developed a Java based application to automate most of the manual work in on boarding a tenant to a multi-tenant environment. This is saving around 4 to 5 hours of manual work per tenant per person every day.
Applied different job tuning techniques while processing data using Hive and spark frameworks to improve the performance of jobs.

Environment: Spark Core, Spark Streaming, Core Java, Python, Hive, Impala, HBase, Sqoop, Kerberos (security), LDAP, and Control M.

Confidential

Software Engineer

Responsibilities:

Analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, and Sqoop.
Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA.
Developed an ETL framework using Spark, Pig and Hive.
Developed Spark Programs for Batch and Real-Time Processing to process incoming streams of data from Kafka sources and transform it into as Data frames and load those data frames into Hive and HDFS.
Experience in developing SQL scripts using Spark for handling different data sets and verifying the performance of Map Reduce jobs.
Developed Spark programs using Spark-SQL library to perform analytics on data in Hive.
Developed various JAVA UDF functions to use in both Hive and Impala for ease of usage in various requirements.
Created multiple MapReduce jobs in Pig and Hive for data cleaning and preprocessing.
Created Hive views/tables for providing SQL like interface.
Successfully loading files to Hive and HDFS from Oracle, SQL Server using SQOOP.
Writing Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Using Hive to analyze the partitioned data and compute various metrics for reporting.
Transformed the Impala queries into hive scripts which can be run using the shell commands directly for higher performance rate.
Created the shell scripts which can be scheduled using Oozie workflows and even the Oozie Coordinators.
Developed the Oozie workflows to generate monthly report files automatically.
Managing and reviewing the Hadoop log files.
Exporting data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.

Environment: Hadoop, MapReduce, Sqoop, HDFS, Hive, Pig, Oozie, Java, Oracle 10g, MySQL, and Impala.

We provide IT Staff Augmentation Services!

Aws Spark Developer Resume

New, JerseY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship