We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

SUMMARY:

  • Over 9 years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Java/J2EE technologies and Confidential Analytics.
  • More than 6 years of work experience in ingestion, storage, querying, processing and analysis of Bigdata with hands on experience in Hadoop Ecosystem development including MapReduce, HDFS, Hive, Pig, Spark, Cloudera Navigator, Mahout, HBase, Zookeeper, Sqoop, Flume, Oozie and AWS.
  • Proficient with Apache Spark ecosystem such as Spark, Spark Streaming using Scala and Python.
  • Extensive experience working in Teradata, Oracle, Netezza, SQLServer and MySQL database.
  • Expertise in developing responsive Front - End components with JSP, HTML, XHTML, JavaScript, DOM, Servlets, JSF, NodeJS, Ajax, jQuery and AngularJS.
  • Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
  • Strong experience working with different Hadoop distributions like Cloudera, Hortonworks, MapReduce and Apache distributions.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions and on Amazon web services (AWS).
  • Experience in Amazon AWS services such as EMR, EC2, S3, Cloud Formation, RedShift which provides fast and efficient processing of Confidential .
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MR, HadoopGEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures.
  • Strong experience and knowledge of real time data analytics using Storm, Kafka, Flume and Spark.
  • Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
  • Experience in installing and maintaining Cassandra by configuring teh cassandra.yaml file as per teh requirement.
  • Involved in upgrading existing MongoDB instances from version 2.4 to version 2.6 by upgrading teh securi

PROFESSIONAL EXPERIENCE:

Confidential

Hadoop Developer

Responsibilities:

  • Developed Spark scripts by using Scala as per teh requirement. Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark. Developed micro - services using Python scripts in Spark Data Frame API's for teh semantic layer. Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive. Has been using NiFi for transferring data from source to destination and Responsible for handling batch as well as Real-time Spark jobs through NiFi.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data. Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data. Built teh complete data ingestion pipeline using NiFi which POST's flow file through invoke HTTP processor to our Micro services hosted inside teh Docker containers. Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way.
  • Designed and implemented MapReduce based large-scale parallel relation-learning system. Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format. Performed data profiling and transformation on teh raw data using Pig and Python. Visualize teh HDFS data to customer using BI tool with teh halp of Hive ODBC Driver. Created Hive Generic UDF's to process business logic dat varies based on policy. Moved Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables.
  • Monitoring Cluster using Cloudera manager. Develop predictive analytic using Apache Spark Scala APIs. Implemented MapReduce counters to gather metrics of good records and bad records. Built data governance processes, procedures, and control for Data Platform using Nifi. Creating real time data streaming solutions and batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume. Used Oozie to orchestrate teh MapReduce jobs dat extract teh data on a timely manner. Involved in designing Kafka for multi data center cluster and monitoring it. Responsible for importing real time data to pull teh data from sources to Kafka clusters. Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Developed Python Spark streaming scripts to load raw files and corresponding. Implemented PySpark logic to transform and process various formats of data like XLS, XLS, JSON, and TXT. Elaborated Python Scripts to fetch/get S3 files using Boto3 module.
  • Built scripts to load PySpark processed files into Redshift DB and used diverse PySpark logics. Developed MapReduce programs to cleanse teh data in HDFS obtained from heterogeneous data sources. Processed metadata files into AWS S3 and Elastic search cluster. Developed Spark code using Scala and Spark-SQL/S

Confidential

Spark Developer

Responsibilities:

  • Involved in tuning teh Spark modules with various memory and resource allocation parameters, setting right Batch Interval time and varying teh number of executors to meet teh increasing load overtime. Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's. Used teh memory computing capabilities of Spark using Scala and performed advanced procedures like text analytics and processing. Responsible in performing sort, join, aggregations, filter, and other transformations on teh datasets using Spark.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, TEMPEffective & efficient Joins, Transformations and other during ingestion process itself. Implemented Spark SQL to connect to Hive to read teh data and distributed processing to make highly scalable. Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources. Worked on analyzing Hadoop cluster and different Confidential analytic tools including HIVE, HBase NoSQL database and Sqoop.
  • Worked on RDD and DataFrame techniques in PySpark for processing data at a faster rate. Developed teh batch scripts to fetch teh data from AWS S3 storage and do required transformations Developed Spark code using Scala and Spark - SQL for faster processing of data. Developed workflow in Oozie to automate teh tasks of loading teh data into HDFS and pre-processing with Pig. Implemented Spark using Scala and utilized Data frames and Spark SQL API for faster processing of data. Built Spark Scripts by utilizing Scala shell commands depending on teh requirement.
  • Involved in performance tuning of Spark applications for fixing right batch interval time and memory tuning. Developed spark scripts and python functions dat involve performing transformations and actions on data sets. Configured Spark Streaming in Python to receive real time data from teh Kafka and store it onto HDFS. Developed Kafka consumer API in Scala for consuming data from Kafka topics. Experienced in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system using Scala programming.
  • Involved in converting MapReduce programs into Spark transformations using Spark RDD in Scala. Developed Spark scripts using Scala Shell commands as per teh requirements. Continuously monitored and managed teh Hadoop cluster using Cloudera Manager. Good knowledge in setting up batch intervals, split intervals and window intervals in Spark Streaming using Scala Programming language. Implemented Spark-SQL with various data sources like JSON, Parquet, ORC and Hive

Environment: HDFS, Spark, Scala, Cloudera, Apache Hadoop 2.6.0 (Yarn), Flume, Eclipse, AWS, Cassandra, MySql, Oozie, zookeeper, Nifi, Kafka, Hortonworks, MapReduce, Pig, Hive, HDFS and Hbase.

Confidential

Sr. Hadoop Developer

Responsibilities:

  • Developed data pipelines using Flume, Sqoop, Pig and MapReduce to ingest data into HDFS for analysis. Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into Hive tables. Implemented Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data into HDFS through Sqoop. Developed Pig scripts to transform teh data into structured format and it are automated through Oozie coordinators.
  • Developed pipeline for constant information ingestion utilizing Kafka, Spark streaming. Experience with complete SDLC process staging code reviews, source code management and build process. Implemented Confidential platforms using Cloudera CDH4 as data storage, retrieval and processing systems. Experienced in Spark Core, Spark SQL, Spark Streaming. Performed transformations on teh data using different Spark modules. Wrote Sqoop scripts for importing large data sets from Teradata into HDFS.
  • Performed Data Ingestion from multiple internal clients using Apache Kafka. Wrote MapReduce jobs to discover trends in data usage by teh users. Developed Flume configuration to extract log data from different resources and transfer data with different file formats (JSON, XML, Parquet) to Hive tables. Load and transform large sets of structured, semi structured and unstructured data using Pig. Experienced working on Pig to do transformations, event joins, filtering and some pre - aggregations before storing teh data onto HDFS.
  • Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting. Involved in developing Hive UDF's for teh needed functionality dat is not available out of teh box from Hive. Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and python. Experienced in running Hadoop streaming jobs to process terabytes of formatted data using Python scripts. Responsible for executing Hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query teh data into HBase.
  • Developed and executed hive queries for de normalizing teh data. Developed teh Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis. Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
  • Automated teh end to end processing using Oozie workflows and coordinators. Experience loading and transforming structured and unstructured data into HBase and exposure handling Automatic failover in HBase. Ran POC's in Spark to take teh benchmarking of teh implementation.

Environment: Cloudera, Java, Scala, Hadoop, Spark, HDFS, MapReduce, Yarn, Hive, Pig, Zookeeper, Impala, Oozie, Sqoop, Flume, Kafka, Teradata, SQL, GitHub, Amazon Web Services

Confidential

Java/Hadoop Developer

Responsibilities:

  • Involved in importing data from Microsoft SQLServer, MySQL, Teradata. into HDFS using Sqoop. Developed workflow in Oozie to automate teh tasks of loading teh data into HDFS. Used Hive to analyze teh partitioned and bucked data to compute various metrics of reporting.
  • Involved in creating Hive tables loading data, and writing queries dat will run internally in MapReduce Involved in creating Hive External tables for HDFS data. Solved performance issues in Hive and Pig Scripts with understanding of Joins, Group and Aggregation and perform teh MapReduce jobs.
  • Used Spark for transformations, event joins and some aggregations before storing teh data into HDFS. Troubleshoot and resolve data quality issues and maintain elevated level of data accuracy in teh data being reported. Analyze teh large amount of data sets to determine optimal way to aggregate.
  • Worked on teh Oozie workflow to run multiple Hive and Pig jobs. Involved in creating Hive UDF's. Developed automated shell script to execute Hive Queries. Involved in processing ingested raw data using Apache Pig. Monitored continuously and managed teh Hadoop cluster using cloudera manager.
  • Worked on different file formats like JSON, AVRO, ORC, Parquet and Compression like Snappy, zlib, ls4 etc. Executed HiveQL in Spark using SparkSQL. Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, scala. Gained Knowledge in creating Tableau dashboard for reporting analyzed data. Expertise with NoSQL databases like HBase. Experienced in managing and reviewing teh Hadoop log files. Used GitHub as repository for committing code and retrieving it and Jenkins for continuous integration.

Environment: HDFS, MapReduce, Sqoop, Hive, Pig, Shark, Spark, Oozie, MySQL, Eclipse, Git, GitHub, Jenkins.

Confidential

Java Developer

Responsibilities:

  • Developed teh use cases and class diagrams using Rational Rose/UML. Used ORM in teh persistence layer and implemented DAO's to access data from with Oracle and MYSQL databases. Used XML, WSDL, UDDI and SOAP Web Services (JAX - WS) using Apache Axis2 framework for communicating data between different applications.
  • Storing teh SOAP messages received in teh JMS Queue of WebSphere MQ (MQ Series). Developed Data access bean and developed EJBs dat are used to access data from teh database. Used EJB to inject teh services and their dependencies. Involved in Coding HTML, CSS, JavaScript for UI validation for dynamic manipulation of teh elements on teh screen and to validate teh input. Wrote PL/SQL and SQL blocks for teh application. Used Core java Multi-Threading concepts for avoiding concurrent processes.
  • Responsible for deploying application file on IBM WebSphere Application server. Used Log4j package for logging, ANT for automated deployment and Junit for Testing.

Environment: J2EE, JDK, EJB 1.x, Java Beans, SOAP Web Services, Apache-Axis1, JSR -286 Portlet, JSF 2.x, JMS, JSP, XML, JNDI, Design Patterns, TOAD, IBM WebSphere, Junit, ANT, PL/SQL, Oracle 9i, MYSQL, Rational Rose, Unix.

Confidential

Software Engineer

Responsibilities:

  • Designed, developed and executed Data Migration from Db2 Database to Oracle Database using Linux scripts, Java and SQL loader concepts. A key member of teh team and playing a key role in articulating teh Design requirements for teh Development of Automated tools dat perform error free Configuration.
  • Developed UNIX and java utilities for Data migration from Db2 to Oracle. Sole developer and POC for teh migration Activity. Developed JSP pages, Servlets and HTML pages as per requirement. Developed teh necessary Java Beans, PL/SQL procedures for teh implementation of business rules.
  • Developed user interface using JAVA Server Pages (JSP), HTML and Java Script for teh Presentation Tier. Developed JSP pages and client - side validation by java script tags. Developed an own realm for Apache Tomcat Server for authenticating teh users. Developed front end controller in Servlet to handle all teh requests.
  • Developed teh web interface using JSP and developed struts action classes. Responsible for both functional and non-functional requirements gathering, performing impact analysis and testing teh solutions build on build basis.
  • Coding using Java, Java Script and HTML. Used JDBC to provide database connectivity to database tables in Oracle. Used WebSphere Application Server for application deployment. Implemented Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support).

Environment: J2EE, IBM DB2, IBM WebSphere Application Server, EJB, JSP, Servlets, HTML, CSS, JavaScript, Oracle database, Unix Scripting and Windows 2000.

We'd love your feedback!