We provide IT Staff Augmentation Services!

Hadoop Developer Resume

2.00/5 (Submit Your Rating)

CO

PROFESSIONAL SUMMARY:

  • Over 9 years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Java/J2EE technologies and Big Data Analytics.
  • More TEMPthan 6 years of work experience in ingestion, storage, querying, processing and analysis of Bigdata wif hands on experience in Hadoop Ecosystem development including MapReduce, HDFS, Hive, Pig, Spark, Cloudera Navigator, Mahout, HBase, Zookeeper, Sqoop, Flume, Oozie and AWS.
  • Proficient wif Apache Spark ecosystem such as Spark, Spark Streaming using Scala and Python.
  • Extensive experience working in Teradata, Oracle, Netezza, SQLServer and MySQL database.
  • Expertise in developing responsive Front - End components wif JSP, HTML, XHTML, JavaScript, DOM, Servlets, JSF, NodeJS, Ajax, jQuery and AngularJS.
  • Excellent understanding and noledge of NOSQL databases like MongoDB, HBase, and Cassandra.
  • Strong experience working wif different Hadoop distributions like Cloudera, Hortonworks, MapReduce and Apache distributions.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions and on Amazon web services (AWS).
  • Experience in Amazon AWS services such as EMR, EC2, S3, Cloud Formation, RedShift which provides fast and efficient processing of Big Data.
  • In depth understanding/noledge of Hadoop Architecture and various components such as HDFS, MR, HadoopGEN2 Federation, High Availability and YARN architecture and good understanding of workload management, scalability and distributed platform architectures.
  • Strong experience and noledge of real time data analytics using Storm, Kafka, Flume and Spark.
  • Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
  • Experience in installing and maintaining Cassandra by configuring teh cassandra.yaml file as per teh requirement.
  • Involved in upgrading existing MongoDB instances from version 2.4 to version 2.6 by upgrading teh security roles and implementing newer features.
  • Responsible for performing reads and writes in Cassandra from and web application by using Java JDBC connectivity.
  • Experience in extending HIVE and PIG core functionality by using custom UDF's and UDAF's.
  • Debugging MapReduce jobs using Counters and MRUNIT testing.
  • Expertise in writing teh Real-time processing application Using spout and bolt in Storm.
  • Experience in configuring various topologies in storm to ingest and process data on teh fly from multiple sources and aggregate into central repository Hadoop.
  • Extensive experience working wif Spark tools like RDD transformations, Spark MLlib and SPARQL.
  • Experienced in moving data from different sources using Kafka producers, consumers and pre-process data using Storm topologies.
  • Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
  • Highly Knowledgeable in streaming data from different data sources like Log files, JMS, applications sources into HDFS using Flume sources.
  • Knowledge of data warehousing and ETL tools like Informatica, Talend and Pentaho.
  • Experience wif Testing MapReduce programs using MR Unit, Junit.
  • Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSF, Struts, Spring, Hibernate, EJB.
  • Experience in different application servers like JBoss/Tomcat, WebLogic, IBM WebSphere.
  • Experience in working wif Onsite-Offshore model.
  • Implemented logging framework - ELK stack (Elastic Search, Logstash & Kibana) on AWS.

TECHNICAL SKILLS:

Big Data Frameworks: Hadoop (HDFS, MapReduce), Spark, Spark SQL, Spark Streaming, Hive, Impala, Kafka, HBase, Flume, Pig, Sqoop, Oozie, Cassandra.

Bigdata distribution: Cloudera, Hortonworks, Amazon EMR

Programming languages: Scala, Python, Java, Shell scripting

Operating Systems: Windows, Linux (Ubuntu, Cent OS), Mac OS

Databases: Oracle, SQL Server, MySQL

Designing Tools: UML, Visio

IDEs: IntelliJ IDEA, Eclipse, NetBeans

Web Technologies: XML, HTML, JavaScript, jQuery, JSON

Linux Experience: System Administration Tools, Puppet

Development methodologies: Agile, Waterfall

Application / Web Servers: Apache Tomcat, WebSphere

Messaging Services: RabbitMQ, Kafka, JMS

Version Tools: Git and CVS

Others: Putty, WinSCP, Data Lake, Talend, AWS, GCP

PROFESSIONAL EXPERIENCE

Hadoop Developer

Confidential, CO

Responsibilities:

  • Developed Spark scripts by using Scala as per teh requirement.
  • Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
  • Developed micro-services using Python scripts in Spark Data Frame API’s for teh semantic layer.
  • Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive.
  • Have been using NiFi for transferring data from source to destination and Responsible for handling batch as well as Real-time Spark jobs through NiFi.
  • Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
  • Built teh complete data ingestion pipeline using NiFi which POST’s flow file through invoke HTTP processor to our Micro services hosted inside teh Docker containers.
  • Involved in creating Hive tables, loading wif data and writing Hive queries which will run internally in MapReduce way.
  • Designed and implemented MapReduce based large-scale parallel relation-learning system.
  • Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format. Performed data profiling and transformation on teh raw data using Pig and Python.
  • Visualize teh HDFS data to customer using BI tool wif teh help of Hive ODBC Driver.
  • Created Hive Generic UDF's to process business logic dat varies based on policy.
  • Moved Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables. Monitoring Cluster using Cloudera manager.
  • Develop predictive analytic using Apache Spark Scala APIs.
  • Implemented MapReduce counters to gather metrics of good records and bad records.
  • Built data governance processes, procedures, and control for Data Platform using Nifi.
  • Creating real time data streaming solutions and batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume.
  • Used Oozie to orchestrate teh MapReduce jobs dat extract teh data on a timely manner.
  • Involved in designing Kafka for multi data center cluster and monitoring it.
  • Responsible for importing real time data to pull teh data from sources to Kafka clusters.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Developed Python Spark streaming scripts to load raw files and corresponding.
  • Implemented PySpark logic to transform and process various formats of data like XLS, XLS, JSON, and TXT. Elaborated Python Scripts to fetch/get S3 files using Boto3 module.
  • Built scripts to load PySpark processed files into Redshift DB and used diverse PySpark logics.
  • Developed MapReduce programs to cleanse teh data in HDFS obtained from heterogeneous data sources. Processed metadata files into AWS S3 and Elastic search cluster.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Involved in scheduling Oozie workflow engine to run multiple Hives and Pig jobs and used Oozie Operational Services for batch processing and scheduling workflows dynamically.
  • Created multi-node Hadoop and Spark clusters in AWS instances to generate terabytes of data and stored it in AWS HDFS.
  • Developed Spark applications using Scala for easy Hadoop transitions.
  • Included migration of existing applications and development of new applications using AWS cloud services. Developed Python Scripts to get teh recent S3 keys from Elastic search.
  • Implemented Security in Web Applications using Azure and deployed Web Applications to Azure.
  • Uploaded click stream data from Kafka to HDFS, HBase, and Hive by integrating wif Storm.
  • Extracted data from SQL Server to create automated visualization reports and dashboards on Tableau.
  • Involved in collecting metrics for Hadoop clusters using Ganglia and Ambari.

Environment: Java, HDP, PySpark, Scala, Jenkins, Git, NiFi, Spark, Map Reduce, Python, Talend, Hive, Pig, Zookeeper, Kafka, HBase, VMware ESX Server, Flume, Sqoop, Oozie, Kerberos, Sentry, AWS, Cent OS

Big Data / Spark Developer

Confidential- Dallas, TX

Responsibilities:

  • Involved in tuning teh Spark modules wif various memory and resource allocation parameters, setting right Batch Interval time and varying teh number of executors to meet teh increasing load overtime.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
  • Used teh memory computing capabilities of Spark using Scala and performed advanced procedures like text analytics and processing.
  • Responsible in performing sort, join, aggregations, filter, and other transformations on teh datasets using Spark.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, TEMPEffective & efficient Joins, Transformations and other during ingestion process itself.
  • Implemented Spark SQL to connect to Hive to read teh data and distributed processing to make highly scalable.
  • Expertized in implementing Spark using Scala and Spark SQL for faster testing and processing of data responsible to manage data from different sources.
  • Worked on analyzing Hadoop cluster and different Big Data analytic tools including HIVE, HBase NoSQL database and Sqoop.
  • Worked on RDD and DataFrame techniques in PySpark for processing data Confidential a faster rate.
  • Developed teh batch scripts to fetch teh data from AWS S3 storage and do required transformations
  • Developed Spark code using Scala and Spark-SQL for faster processing of data.
  • Developed workflow in Oozie to automate teh tasks of loading teh data into HDFS and pre-processing wif Pig.
  • Implemented Spark using Scala and utilized Data frames and Spark SQL API for faster processing of data.
  • Built Spark Scripts by utilizing Scala shell commands depending on teh requirement.
  • Involved in performance tuning of Spark applications for fixing right batch interval time and memory tuning.
  • Developed spark scripts and python functions dat involve performing transformations and actions on data sets.
  • Configured Spark Streaming in Python to receive real time data from teh Kafka and store it onto HDFS.
  • Developed Kafka consumer API in Scala for consuming data from Kafka topics.
  • Experienced in writing live Real-time Processing and core jobs using Spark Streaming wif Kafka as a data pipe-line system using Scala programming.
  • Involved in converting MapReduce programs into Spark transformations using Spark RDD in Scala.
  • Developed Spark scripts using Scala Shell commands as per teh requirements.
  • Continuously monitored and managed teh Hadoop cluster using Cloudera Manager.
  • Good noledge in setting up batch intervals, split intervals and window intervals in Spark Streaming using Scala Programming language.
  • Implemented Spark-SQL wif various data sources like JSON, Parquet, ORC and Hive.

Environment: HDFS, Spark, Scala, Cloudera, Apache Hadoop 2.6.0 (Yarn), Flume, Eclipse, AWS, Cassandra, MySql, Oozie, zookeeper, Nifi, Kafka, Hortonworks, MapReduce, Pig, Hive, HDFS and Hbase.

Sr. Hadoop Developer

Confidential - Pittsburgh, PA

Responsibilities:

  • Developed data pipelines using Flume, Sqoop, Pig and MapReduce to ingest data into HDFS for analysis.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into Hive tables.
  • Implemented Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data into HDFS through Sqoop.
  • Developed Pig scripts to transform teh data into structured format and it are automated through Oozie coordinators.
  • Developed pipeline for constant information ingestion utilizing Kafka, Spark streaming.
  • Experience wif complete SDLC process staging code reviews, source code management and build process.
  • Implemented Big Data platforms using Cloudera CDH4 as data storage, retrieval and processing systems.
  • Experienced in Spark Core, Spark SQL, Spark Streaming.
  • Performed transformations on teh data using different Spark modules.
  • Wrote Sqoop scripts for importing large data sets from Teradata into HDFS.
  • Performed Data Ingestion from multiple internal clients using Apache Kafka.
  • Wrote MapReduce jobs to discover trends in data usage by teh users.
  • Developed Flume configuration to extract log data from different resources and transfer data wif different file formats (JSON, XML, Parquet) to Hive tables.
  • Load and transform large sets of structured, semi structured and unstructured data using Pig.
  • Experienced working on Pig to do transformations, event joins, filtering and some pre-aggregations before storing teh data onto HDFS.
  • Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting.
  • Involved in developing Hive UDF's for teh needed functionality dat is not available out of teh box from Hive.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and python.
  • Experienced in running Hadoop streaming jobs to process terabytes of formatted data using Python scripts.
  • Responsible for executing Hive queries using Hive Command Line, Web GUI HUE and Impala to read, write and query teh data into HBase.
  • Developed and executed hive queries for de normalizing teh data.
  • Developed teh Apache Storm, Kafka, and HDFS integration project to do a real-time data analysis.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD's.
  • Automated teh end to end processing using Oozie workflows and coordinators.
  • Experience loading and transforming structured and unstructured data into HBase and exposure handling Automatic failover in HBase.
  • Ran POC's in Spark to take teh benchmarking of teh implementation.

Environment: Cloudera, Java, Scala, Hadoop, Spark, HDFS, MapReduce, Yarn, Hive, Pig, Zookeeper, Impala, Oozie, Sqoop, Flume, Kafka, Teradata, SQL, GitHub, Amazon Web Services

Java/Hadoop Developer

Qualcomm

Responsibilities:

  • Involved in importing data from Microsoft SQLServer, MySQL, Teradata. into HDFS using Sqoop.
  • Developed workflow in Oozie to automate teh tasks of loading teh data into HDFS.
  • Used Hive to analyze teh partitioned and bucked data to compute various metrics of reporting.
  • Involved in creating Hive tables loading data, and writing queries dat will run internally in MapReduce
  • Involved in creating Hive External tables for HDFS data.
  • Solved performance issues in Hive and Pig Scripts wif understanding of Joins, Group and Aggregation and perform teh MapReduce jobs.
  • Used Spark for transformations, event joins and some aggregations before storing teh data into HDFS.
  • Troubleshoot and resolve data quality issues and maintain elevated level of data accuracy in teh data being reported.
  • Analyze teh large amount of data sets to determine optimal way to aggregate.
  • Worked on teh Oozie workflow to run multiple Hive and Pig jobs.
  • Involved in creating Hive UDF's.
  • Developed automated shell script to execute Hive Queries.
  • Involved in processing ingested raw data using Apache Pig.
  • Monitored continuously and managed teh Hadoop cluster using cloudera manager.
  • Worked on different file formats like JSON, AVRO, ORC, Parquet and Compression like Snappy, zlib, ls4 etc.
  • Executed HiveQL in Spark using SparkSQL.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, scala.
  • Gained Knowledge in creating Tableau dashboard for reporting analyzed data.
  • Expertise wif NoSQL databases like HBase.
  • Experienced in managing and reviewing teh Hadoop log files.
  • Used GitHub as repository for committing code and retrieving it and Jenkins for continuous integration.

Environment: HDFS, MapReduce, Sqoop, Hive, Pig, Shark, Spark, Oozie, MySQL, Eclipse, Git, GitHub, Jenkins.

Java Developer

Confidential

Responsibilities:

  • Developed teh use cases and class diagrams using Rational Rose/UML.
  • Used ORM in teh persistence layer and implemented DAO’s to access data from wif Oracle and MYSQL databases.
  • Used XML, WSDL, UDDI and SOAP Web Services (JAX-WS) using Apache Axis2 framework for communicating data between different applications.
  • Storing teh SOAP messages received in teh JMS Queue of WebSphere MQ (MQ Series).
  • Developed Data access bean and developed EJBs dat are used to access data from teh database.
  • Used EJB to inject teh services and their dependencies.
  • Involved in Coding HTML, CSS, JavaScript for UI validation for dynamic manipulation of teh elements on teh screen and to validate teh input.
  • Wrote PL/SQL and SQL blocks for teh application.
  • Used Core java Multi-Threading concepts for avoiding concurrent processes.
  • Responsible for deploying application file on IBM WebSphere Application server.
  • Used Log4j package for logging, ANT for automated deployment and Junit for Testing.

Environment: J2EE, JDK, EJB 1.x, Java Beans, SOAP Web Services, Apache-Axis1, JSR -286 Portlet, JSF 2.x, JMS, JSP, XML, JNDI, Design Patterns, TOAD, IBM WebSphere, Junit, ANT, PL/SQL, Oracle 9i, MYSQL, Rational Rose, Unix.

Software Engineer

Confidential

Responsibilities:

  • Designed, developed and executed Data Migration from Db2 Database to Oracle Database using Linux scripts, Java and SQL loader concepts.
  • A key member of teh team and playing a key role in articulating teh Design requirements for teh Development of Automated tools dat perform error free Configuration.
  • Developed UNIX and java utilities for Data migration from Db2 to Oracle. Sole developer and POC for teh migration Activity.
  • Developed JSP pages, Servlets and HTML pages as per requirement.
  • Developed teh necessary Java Beans, PL/SQL procedures for teh implementation of business rules.
  • Developed user interface using JAVA Server Pages (JSP), HTML and Java Script for teh Presentation Tier.
  • Developed JSP pages and client-side validation by java script tags.
  • Developed an own realm for Apache Tomcat Server for authenticating teh users.
  • Developed front end controller in Servlet to handle all teh requests.
  • Developed teh web interface using JSP and developed struts action classes.
  • Responsible for both functional and non-functional requirements gathering, performing impact analysis and testing teh solutions build on build basis.
  • Coding using Java, Java Script and HTML.
  • Used JDBC to provide database connectivity to database tables in Oracle.
  • Used WebSphere Application Server for application deployment.
  • Implemented Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support).

Environment: J2EE, IBM DB2, IBM WebSphere Application Server, EJB, JSP, Servlets, HTML, CSS, JavaScript, Oracle database, Unix Scripting and Windows 2000.

We'd love your feedback!