We provide IT Staff Augmentation Services!

Bigdata Developer Resume

SUMMARY

  • Over 7+ years of experience in various projects and environments as an IT professional and extensive knowledge in Big data and Hadoop ecosystem.
  • Experienced in working with Spark ecosystem using Spark - SQL and Scala queries on different data file formats like Parquet, ORC, Sequence, .txt, .csv etc.
  • In-depth knowledge and hands-on experience in dealing with Apache Hadoop components like HDFS, HBase, Hive, Spark, Sqoop, MapReduce, Oozie, Control-M, Tidal, MongoDB.
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using Scala.
  • In - depth understanding of Spark Architecture including Spark core, Spark SQL, Data Frames and Spark Real time Streaming.
  • Experienced in Performing tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Have good experience in creating real time data streaming solutions using Apache Spark/ Spark Streaming/Apache Storm, Kafka and Flume.
  • Loaded the data into Spark RDD and do in memory data computation to generate the output response.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD’s.
  • Excellent understanding and knowledge of NOSQL databases like MongoDB and Cassandra.
  • Designed Databases, created and managed schemas, wrote stored processing, functions, DDL, DML, SQL queries and data modeling
  • Good knowledge on AWS infrastructure services Amazon Simple Storage Service (Amazon S3), EMR, and Amazon Elastic Compute Cloud (Amazon EC2).
  • Implemented Ad - hoc query using Hive to perform analytics on structured data.
  • Expertise in writing Hive UDF, Generic UDF's to incorporate complex business logic into Hive Queries.
  • Experienced in optimizing Hive queries by tuning configuration parameters.
  • Have extensively worked in developing ETL program for supporting Data Extraction, transformations and loading using Informatica Power Center.
  • Hands-on experience in ETL tool like Informatica, Infoworks, Attunity.
  • Extensive experience in ETL Architecture, Development, enhancement, maintenance, Production support, Data Modeling, Data profiling, Reporting including Business requirement, system requirement gathering.
  • Hands-on experience in shell scripting. Knowledge on cloud services AWS and MS Azure.
  • Proficient in using RDMS concepts with MariaDB, Oracle, MS-SQL server and MySQL.
  • Experienced in project life cycle (design, development, testing and implementation) of Client Server and Web applications.
  • Independently perform complex troubleshooting, root-cause analysis and solution development.
  • Ability to meet deadlines and handle multiple tasks, decisive with strong leadership qualities, flexible in work schedules and possess good communication skills.
  • Experience in processing different file formats like XML, JSON and sequence file formats.
  • Good Experience in creating Business Intelligence solutions and designing ETL workflows using Tableau.
  • Collaborate with development and QA teams to maintain high-quality deployment
  • Infrastructure Migrations: Drive Operational efforts to migrate all legacy services to a fully Virtualized Infrastructure.
  • Rapidly learn and adapt quickly to emerging new technologies and paradigms.

PROFESSIONAL EXPERIENCE

Confidential

Bigdata Developer

Responsibilities:

  • Worked on analyzing Hadoop cluster and different big data analytic tools including Spark, HDFS, Hive, Sqoop.
  • Develop Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Involved in the development of Spark application for one of the data sources using Scala, Spark.
  • Experience in managing and reviewing Hadoop log files.
  • Experienced in Performing tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Involved in performance tuning where there was a latency or delay in execution of code
  • Loaded the data into Spark RDD and do in memory data computation to generate the output response.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD’s.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Developed various algorithms for generating several data patterns. Created Oozie workflows to run multiple MR, Hive and pig jobs.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Hive and Sqoop.
  • Implemented test scripts to support test driven development and continuous integration.
  • Worked on tuning the performance of MapReduce Jobs.
  • Involved in loading data from LINUX file system to HDFS.
  • Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
  • Worked on tuning Hive to improve performance and solve performance related issues in Hive scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Created POC on Cloudera and suggested the best practice in terms CDH platform
  • Installed and configured CDH cluster, using Cloudera manager for easy management of existing Hadoop cluster.
  • Worked on setting up high availability for major production cluster. Performed Hadoop version updates using automation tools.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Worked on installing cluster, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.

Environment: Hadoop, HDFS, Yarn, Sqoop, Hive, Cloudera Manager, Shell Scripting, Linux, Red Hat, MongoDB, Control-M, Informatica, Attunity Replica, Spark, Scala, Java.

Confidential

Hadoop Developer

Responsibilities:

  • Involved in building scalable distributed data lake system for Confidential real time and batch analytical needs.
  • Experience in job management using Fair Scheduling and Developed job processing scripts using Control-M workflow.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark 2.40 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in Performing tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Involved in performance tuning where there was a latency or delay in execution of code
  • Loaded the data into Spark RDD and do in memory data computation to generate the output response.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD’s.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capacities of Spark using Scala.
  • Experienced in handling large datasets using partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformation and other during ingestion process itself.
  • Worked on migrating legacy Map Reduce programs into Spark transformations using Spark and Scala.
  • Worked on a POC to compare processing time for Impala with Apache Hive for batch applications to implement the former in project.
  • Worked extensively with Sqoop for importing metadata from Oracle.
  • Worked extensively with Sqoop for importing data from Oracle, Teradata, MySQL to Hadoop.

Environment: Hadoop, HDFS, Yarn, Sqoop, Hive, Cloudera Manager, Shell Scripting, Linux, Red Hat, MongoDB, Control-M, Informatica, Attunity Replica, Spark, Scala, Java.

Confidential

Software Engineer

Responsibilities:

  • Developed and implemented Page Object Model (POM) design pattern based framework with Selenium WebDriver using Object Oriented Programming Java and TestNG.
  • Implemented functional testing with Automation Test Framework such as Keyword Driven, Data Driven to ensure the code reusability and code maintainability, which reduces the script development time.
  • Developed Restful Web Service testing framework with Rest-Assured to build robust and scalable web service testing.
  • Responsible for implementation and ongoing administration of Hadoop infrastructure and setting up infrastructure
  • Involved in requirements phase to understand the application impact and assisted System
  • Analysts to gather inputs for the preparation of Functional Specification Document
  • Worked extensively on Spring Boot for building web service and integrated Apache Camel (ESB) with Spring Boot.
  • Involved in design and implementation of web tier using Servlets and JSP.
  • Developed and implemented Behavior Driven Development (BDD) and Behavior Driven Testing (BDT) with Cucumber JVM to implement Test Driven Development (TDD).
  • Develop Features file and scenario with Gherkin Language for Behavior Driven Testing.
  • Responsible for Acceptance Test Driven Development (ATDD) or Behavior Driven Development (BDD) approaches to developing and testing software.
  • Created Modular Automated Global Test framework library for reusable, easy to use and easy maintenance automated test scripts
  • Analyzed Technical specification, Business Requirements and Database Schema to develop test cases.
  • Responsible for Planning, creating and analysis of Test Plan, Test Strategy, Test cases and Test Scripts, and Test Matrix.
  • Used detailed knowledge of application features and functions assess scope and impact of business needs throughout analysis and completion of all enhancement specifications.
  • Participated in requirements walkthroughs with users to better understand requirements.
  • Performed Backend testing using SQL queries to retrieve and verify information in the database.
  • Developed Web Service Test with Soap UI to test SOA platform.
  • Corresponded among QA Managers, Developers and Team Members.
  • Reported and tracked defects, monitored defects reported by team.

Environment: Selenium Webdriver, Java, Cucumber, Gherkin Language, Jenkins, TestNG, Restful Web Service, Java, Servlet, SOA, Restful Web Service, Soap UI, Jira, ALM/QC, HTML, SQL Server and Oracle.

Hire Now