We provide IT Staff Augmentation Services!

Hadoop/ Spark Developer Resume

3.00/5 (Submit Your Rating)

Chicago, IL

PROFESSIONAL SUMMARY:

  • Have 10+ years of strong experience in software development using JAVA/J2EE, BigData, Hadoop, Apache Spark, Scala and Python Technologies.
  • Around 4 years of experience on Big Data Tools in Hadoop Ecosystem including Pig, Hive, HDFS, MapReduce, Sqoop, Storm, Yarn, Oozie, and Zookeeper.
  • Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm
  • Have 2+ years of experience using Spark Core API, Spark SQL and Spark Streaming,
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle
  • Strong experience on Hadoop Distributions like Cloudera, MapR, Horton Works and Data Bricks.
  • Experience in manipulating/analysing large datasets and finding patterns and insights within Structured and Unstructured data
  • Good understanding of NoSQL databases and hands on work experience in writing applications on NoSQL databases like HBase, Cassandra and MongoDB
  • Experienced in writing complex MapReduce programs that work with different file formats like Text, Sequence, Xml, parquet and Avro
  • Experience in migrating the data using Sqoop from HDFS to Relational Database System and vice - versa according to client's requirement
  • Extensive Experience on importing and exporting data using stream processing platforms like Flume and Kafka
  • Hands on experience with SPARK, Scala and KAFKA to handle the streaming data
  • Hands on experience with creation of Dashboard Reports and business intelligence visualizations using TABLEAU.
  • Hands on experience in developing Hybrid Test Automation Framework using Selenium - Web Driver, Java, TestNG, CUCUMBER, Maven and Jenkins.
  • Experience in Data validation, Data Warehousing ETL concepts and Back end testing to check the data integrity
  • Hands on experience in automating iOS and Android native, hybrid and web applications using Appium, Java, Android SDK and Xcode
  • Experience in TestNG, Junit, Data Driven, Keyword Driven Frameworks using Selenium, Good understanding of different Annotations used while working with TestNG and Selenium Grid
  • Hands on experience in testing Web services using SOAP UI NG/Ready API
  • Hands on experience in API Testing using Java & XML based framework
  • Integrated Frameworks with different Cloud applications Browser Stack, Sauce Labs and AWS Device Farm for Automation suite execution on cloud for web and mobile apps.
  • Experienced in working with Amazon Web Services (AWS) using EC2 for computing and S3 as storage mechanism
  • Key participant in all phases of software development life cycle with Analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in client server environment, Object Oriented Technology and Web based applications
  • Well versed with Agile/SCRUM Process and Sprint Life Cycle

TECHNICAL SKILLS:

BigData/Hadoop Technologies: HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm,, Zookeeper and Oozie, NO SQL 

Databases: HBase, Cassandra, MongoDB

Automation Tools: Selenium (RC, WebDriver and GRID), UFT/QTP, SOAP UI, Ready API, APPIUM, Perfecto, See Test and SQUISH

Frameworks: Cucumber, TestNG, JUnit and Hybrid

Programming Languages: Java, VBScript, Scala, Python and C

Development Editors: Eclipse, NetBeans and IntelliJ

Project Management Tools: Clear Quest, Rally, JIRA and ALM

Test Management Tools: ALM/QC

Source Code Control Tools: SVN, CVS, Clear Case, ALM/QC, STASH, and Git

Cloud Applications: Browser Stack, Sauce Labs and AWS Device Farm

CI and Build Tools: ANT, Maven and Jenkins

Operating Systems: Windows, Unix(Sun Solaris), Linux and Ubuntu

PROFESSIONAL EXPERIENCE:

Confidential, Chicago, IL

Hadoop/ Spark Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
  • Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyse the logs produced by the spark cluster.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Designed, developed and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems as well as RDBMS and NoSQL data stores for data access and analysis.
  • Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement the former in project.
  • Worked on Cluster of size 120 nodes.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS.
  • Involved in creating Hive tables, and loading and analysing data using hive queries
  • Developed Hive queries to process the data and generate the data cubes for visualizing
  • Implemented schema extraction for Parquet and Avro file Formats in Hive.
  • Good experience with Talend open studio for designing ETL Jobs for Processing of data.
  • Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
  • Good experience with continuous Integration of application using Jenkins.
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
  • Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability

Confidential, Chandler, AZ

Spark Developer

Responsibilities:

  • Worked on analysing Hadoop cluster using different big data analytic tools including Pig, Hive and MapReduce.
  • Managing fully distributed Hadoop cluster is an additional responsibility assigned to me. I was trained to overtake the responsibilities of a Hadoop Administrator, which includes managing the cluster, Upgrades and installation of tools that uses Hadoop ecosystem.
  • Worked on Installation and configuring of Zoo Keeper to co-ordinate and monitor the cluster resources.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on POC’s with Apache Spark using Scala to implement spark in project.
  • Consumed the data from Kafka using Apache spark.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Involved in loading data from LINUX file system to HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive
  • Worked in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Extending HIVE and PIG core functionality by using custom User Defined Function’s (UDF), User Defined Table-Generating Functions (UDTF) and User Defined Aggregating Functions (UDAF) for Hive and Pig using python.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs
  • Experienced with performing CURD operations in HBase.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Responsible for loading data files from various external sources like ORACLE, MySQL into staging area in MySQL databases.
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Actively involved in code review and bug fixing for improving the performance.
  • Involved in development, building, testing, and deploy to Hadoop cluster in distributed mode.
  • Created Linux shell Scripts to automate the daily ingestion of IVR data
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Processed the raw data using Hive jobs and scheduling them in Crontab.
  • Helped the Analytics team with Aster queries using HCatlog.
  • Automated the History and Purge Process.
  • Created HBase tables to store various data formats of incoming data from different portfolios.
  • Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
  • Developed the verification and control process for daily load.
  • Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.

Confidential, Dallas, TX

Hadoop Developer

Responsibilities:

  • Worked on the proof-of-concept for Apache Hadoop1.20.2 framework initiation
  • Installed and configured Hadoop clusters and eco-system
  • Developed automated scripts to install Hadoop clusters
  • Involved in all phases of the Big Data Implementation including requirement analysis, design, development, building, testing, and deployment of Hadoop cluster in fully distributed mode Mapping the DB2 V9.7, V10.x Data Types to Hive Data Types and validated
  • Performed load and retrieve unstructured data (CLOB, BLOB etc.)
  • Developed Hive jobs to transfer 8 years of bulk data from DB2 to HDFS layer
  • Implemented Data Integrity and Data Quality checks in Hadoop using Hive and Linux scripts
  • Job automation framework to support & operationalize data loads
  • Automated the DDL creation process in hive by mapping the DB2 data types
  • Monitored Hadoop cluster job performance and capacity planning.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Had experience in Hadoop framework, HDFS, MapReduce processing implementation.
  • Tuning Hadoop performance with high availability and involved in recovery of Hadoop clusters
  • Responsible for coding Java Batch, Restful Service, Map Reduce program, Hive query’s, testing, debugging, Peer code review, troubleshooting and maintain status report.
  • Used AVRO, Parquet file formats for serialization of data.
  • Good experience with Informatica Power Center.
  • Developed several test cases using MR Unit for testing Map Reduce Applications
  • Responsible for troubleshooting and resolving the performance issues of Hadoop cluster.
  • Used Bzip2 compression technique to compress the files before loading it to Hive
  • Support/Troubleshoot hive programs running on the cluster and Involved in fixing issues arising out of duration testing.
  • Prepare daily and weekly project status report and share it with the client.

Confidential, Corvallis, OR

Automation Engineer

Responsibilities:

  • Analysing the manual test cases for the feasibility of automation in regression phase
  • Implemented Acceptance testing suite using Cucumber and Java for stakeholders covering base test scenarios
  • Performed automation feasibility using Selenium Web Driver and demonstrated Proof Of Concepts
  • Integrated with Appium to execute the selenium scripts on iOS and Android mobile devices.
  • Design, development and implementation of test automation framework that is best suitable for the test infrastructure requirements
  • Involved in identifying automation test scenarios, set up the required reusable functions, repository, environment variables, test data and functions required for the automation scripts functionality
  • Identified test data and organized in excel files scenario wise for test input at run time
  • Created business functions and generic functions to be used across using the scripts to enable single point of maintenance
  • Prepared automation test scripts to validate various functionalities using Selenium Web Driver
  • Prepare the review reports (code reviews, execution reviews) for the automation scripts
  • Helped in preparing driver scripts, data driven tests and test suites.
  • Involved in executing the automation scripts and storing the results
  • Communicating with other team’s members (Development Team, Technical Support, Business Support) in order to resolve the issues
  • Analysed test results and reported defects
  • Prepare status reports such as daily status report, weekly status report and monthly consolidated report
  • Prepared a user guide and installation guide to help novice users in execution

Confidential

Automation Engineer

Responsibilities:

  • Performed automation feasibility using Selenium Web Driver and demonstrated Proof Of Concept
  • Prepare the review reports (code reviews, execution reviews) for the automation scripts
  • Using Xcode, deployed the app on iOS devices (iPhone and iPad) and executed selenium automation scripts.
  • Integrated with Android SDK and automated the mobile application on Android devices.
  • Analysing the manual test cases for the feasibility of automation in regression phase
  • Implemented Acceptance testing suite using Cucumber and Ruby for stakeholders covering base test scenarios
  • Prepare status reports such as daily status report, weekly status report and monthly consolidated report
  • Communicating with other team’s members (Development Team, Technical Support, Business Support) in order to resolve the issues
  • Identified test data and organized in excel files scenario wise for test input at run time
  • Created business functions and generic functions to be used across using the scripts to enable single point of maintenance
  • Involved in executing the automation scripts and storing the results
  • Analysed test results and reported defects
  • Design, development and implementation of test automation framework that is best suitable for the test infrastructure requirements.

Confidential

Software Engineer

Responsibilities:

  • Understanding Requirement Specifications and Design Documents.
  • Performed automation feasibility using Selenium Web Driver and demonstrated Proof Of Concept
  • Designed and developed Selenium framework using Java and TestNG
  • Integrated with Appium to execute the selenium scripts on iOS and Android mobile devices.
  • Designed & implemented BDD based cucumber framework for Acceptance testing.
  • Designed frame work as data driven i.e. based on user input data test behaviour changes and number of test runs depends on the input
  • Identified test data and organized in CSV scenario wise for test input at run time
  • Maintain the scripts application version wise in SVN
  • Analyse test execution results and post defects with detailed steps and screenshots in Rally
  • Review all test artefact’s and scripts developed by team and maintain review log for future reference
  • Constant co-ordination with client through emails and weekly conference calls
  • Scheduled night builds, analyse reports and distribute failures over the team to fix those
  • Prepare status reports such as daily status report, weekly status report and monthly consolidated report.

Confidential

Software Engineer

Responsibilities:

  • Analysed Test cases to Automate and generated the Scripts.
  • Developed automation test plans, test scripts from the scratch using VB script.
  • Extensively used Quality Center for maintaining test case, QTP scripts, reporting bugs.
  • Adhered to VB Script coding standards to keep library functions readable, Reusable & maintainable.
  • Enhanced Scripts using Descriptive Programming and various conditions.
  • Developed test scripts, execution of scripts and analysing the test results.
  • Involved in Regression, Acceptance Testing and Production Acceptance Testing.
  • Coordinate with development team to assist root cause analysis to resolve the defects.
  • Involved as a UAT Test coordinator to assist the users in the testing point of view.
  • Make sure all the UAT phase defects are fixed & closed and UAT test scripts and results have been documented.

We'd love your feedback!