We provide IT Staff Augmentation Services!

Hadoop Developer Resume

4.00/5 (Submit Your Rating)

MD

PROFESSIONAL SUMMARY

  • 9 years of professional IT industry experience encompassing wide range of skill set in Big Data technologies and Java/J2EE technologies.
  • 4 years of experience in working wif Big Data Technologies on systems which comprises of massive amount of data running in highly distributive mode in Cloudera, Hortonworks Hadoop distributions.
  • Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, ZooKeeper, Kafka, Flume, MapReduce and Yarn.
  • Excellent Core Java development skills and familiarity wif coding business components using various API's of Java like Multithreading, Collections.
  • Accomplished complex HiveQL queries for required data extraction from Hive tables and written Hive UDF's as required.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
  • Experienced working wif Spark Streaming, Spark SQL and Kafka for real - time data processing.
  • Strong experience troubleshooting Spark applications and various performance considerations to take for efficient memory handling.
  • Worked on HBase to load and retrieve data for real time processing using Rest API.
  • Worked on developing ETL Workflows on the data obtained using Python for processing it in HDFS and HBase using Oozie.
  • Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Worked in developing a Nifi flow prototype for data ingestion in HDFS.
  • Worked on setting up Apache NiFi and performing POC wif NiFi in orchestrating a data pipeline.
  • Good experience on general data analytics on distributed computing cluster like Hadoop using Apache Spark, Impala, and Scala.
  • Strong experience in analyzing large amounts of data sets writingPySparkscripts and Hive queries.
  • Extensive experience in working wif various distributions of Hadoop Enterprise versions of Cloudera(CDH5), Hortonworks and good knowledge on Amazon's EMR (Elastic MapReduce).
  • Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
  • Developed various Shell Scripts and python scripts to automate Spark jobs and hive scripts.
  • Experienced in using Pig scripts to do transformations, event joins filters and some pre-aggregations before storing the data onto HDFS.
  • Experience wif Maven, ANT for continuous integration and builds.
  • Experience in implementing MVC frameworks like JSF, Spring MVC and ORM tools like Hibernate in J2EE architecture.
  • Developed Complex database objects like Stored Procedures, Functions, Packages and Triggers using SQL and PL/SQL.

TECHNICAL SKILLSET

Big Data: HDFS, MapReduce, Hive, Pig, ZooKeeper, Apache Spark, Nifi Core, MlLib, Hortonworks, Spark SQL and Dataframes

Utilities: Sqoop, Flume, Kafka, Oozie and AutoSys

No SQL Databases: Hbase,Cassandra

Languages: C, C++, Java, Python, J2EE, PL/SQL, MR, Pig Latin, HiveQL, Unix shell scripting and Scala

Operating Systems: Sun Solaris, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Databases and Datawarehousing: Teradata, DB2, Oracle 9i/10g/11g, SQL Server, MySQL

Tools and IDE: Maven, Toad, Eclipse, NetBeans, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer

PROFESSIONAL EXPERIENCE

Confidential, MD

Hadoop Developer

Responsibilities:

  • Working knowledge of Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL and Spark Streaming.
  • Exploring wif the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Performed Spark join optimizations, troubleshooted, monitored and wrote efficient codes using Scala.
  • Experienced wif batch processing of data sources using Apache Spark and Elastic search.
  • Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
  • Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
  • Experienced to implement Hortonworks distribution system.
  • Creating Hive tables and working on them for data analysis to cope up wif the requirements.
  • Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
  • Loaded all data-sets into Hive from Source CSV files using spark and Cassandra from Source CSV files using Spark/PySpark.
  • Worked on Spark using Scala and Spark SQL for faster testing and processing of data.
  • Migrated the computational code in hql toPySpark.
  • Completed data extraction, aggregation and analysis in HDFS by usingPySparkand store the data needed to Hive.
  • Participated in development/implementation of Cloudera Hadoop environment.
  • Developed Python code to gather the data from HBase (Cornerstone) and designs the solution to implement usingPySpark.
  • Experienced in working wif Elastic MapReduce(EMR).
  • Developed Map Reduce programs for some refined queries on big data.
  • In-depth understanding of classic MapReduce and YARN architecture.
  • Worked wif business team in creating Hive queried for ad hoc access.
  • Use Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Implemented Hive Generic UDF's to implement business logic.
  • Analyzed the data by performing Hive queries, ran Pig scripts, Spark SQL and Spark Streaming.
  • Installed and configured Pig for ETL jobs.
  • Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
  • Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS.
  • Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and tan exported the transformed data to Cassandra as per the business requirement.
  • Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and Pig to pre-process the data.
  • Created detailed AWS Security groups which behaved as virtual firewalls dat controlled the traffic allowed reaching one or more AWS EC2 instances.
  • Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers.
  • Performed data integration wif a goal of moving more data effectively, efficiently and wif high performance to assist in business-critical projects using Talend Data Integration.
  • Design, developed, unit test, and support ETL mapping and scripts for data marts using Talend.
  • Used HUE for running Hive queries. Created partitions according to day using Hive to improve performance.
  • Built a data flow pipeline using flume, Java (MapReduce) and Pig.
  • Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
  • Experience in using version control tools like GITHUB to share the code snippet among the team members.
  • Analyzed HBase data in Hive by creating external partitioned and bucketed tables
  • Perform POC on single member debug on Spark and Hive

Environment: Hadoop 2x, Apache Spark, Spark-SQL, Dataframes, Scala, HDFS, HIVE, Oozie, Kafka, Autosys, Oracle, Teradata, Python/PySpark, Map Reduce, Sqoop, HBase, Shell Scripting, PIG, Core Java, Cassandra, Nifi, Talend Big Data Integration, Cloudera Hadoop Distribution, PL/SQL, Toad, Windows NT, LINUX

Confidential, NJ

Hadoop Developer

Responsibilities:

  • Worked on migrating MapReduce programs into Spark transformations using Scala. Responsible for building scalable distributed data solutions using Hadoop.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Responsible for cluster maintenance by adding and removing cluster nodes. Cluster monitoring, troubleshooting, managing and reviewing data backups and log files.
  • Wrote complex MapReduce jobs in Java to perform operations by extracting, transforming and aggregating to process terabytes of data.
  • Collected and aggregated large amounts of stream data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
  • Analyzed data using Hadoop components Hive and Pig.
  • Scripted complex HiveQL queries on Hive tables to analyze large datasets and wrote complex Hive UDFs to work wif sequence files.
  • Worked on setting up Apache NiFi and performed POC using NiFi in orchestrating data flows.
  • Migrated the computational code in hql toPySpark.
  • Scheduled workflows using Oozie to automate multiple Hive and Pig jobs, which run independently wif time and data availability.
  • Responsible for creating Hive tables, loading data and writing Hive queries to analyze data.
  • Generated reports using QlikView.
  • Worked wif Senior Engineer on configuring Kafka for streaming data
  • Developed Spark programs using Scala API’s to compare the performance of Spark wif Hive and SQL
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Horton works, and Cloudera (CDH3, CDH4) distributions on Amazon web services (AWS).
  • Developed and ConfiguredKafka brokersto pipeline server logs data into spark streaming
  • Wrote several Hive queries to get valuable information from the hidden large datasets.
  • Loaded and transformed large sets of structured, semi-structured and unstructured data using Hadoop/Big Data concepts.
  • Built Spark Scripts by utilizing Scala shell commands depending on the requirement.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Collect all the logs from source systems into HDFS using Kafka and perform analytics on it.
  • Imported data from Teradatadatabase into HDFS and exported the analyzed patterns data back to Teradata using Sqoop.
  • Worked on Spark using Scala and Spark SQL for faster testing and processing of data.
  • Worked wif Informatica to perform ETL jobs.
  • Performed POCs on Spark test environment
  • Written ETL jobs to visualize the data and generate reports from MySQL database using DataStage

Environment: Hadoop, HDFS, Hive, Pig, Flume, Python, Kafka, Hbase, Scala, Sqoop, Oozie, DataStage, Linux, Hortonworks Distribution, Relational Databases

Confidential, Quincy, MA

Big Data Engineer

Responsibilities:

  • Extracted data from relational databases such as SQL Server and MySql by developing Scala and SQL code
  • Uploaded it to Hive and combined new tables wif existingdatabases
  • Developed code to pre-process large sets of various types of file formats such as Text, Avro, Sequence files, XML, JSON and Parquet
  • Configured big data workflows to run on the top of Hadoop which comprises of heterogeneous jobs like Pig, Hive, Sqoop and MapReduce
  • Loaded various formats of structured and unstructured data from Linux file system to HDFS
  • Used Combiners and Partitioners in MapReduce programming
  • Written Pig Scripts to ETL the data into NOSQL database for faster analysis
  • Read from Flume and involved in pushing batches of data to HDFS and HBase for real time processing of the files
  • Parsing XML data into structured format and loading into HDFS
  • Scheduled various ETL process and Hive scripts by developing Oozie workflow
  • Utilized Tableau to visualize the analyzed data and performed report design and delivery
  • Created POC for Flume implementation
  • Involved in reviewing both functional and non-functional aspects of the business model
  • Championed to communicate and present the models to business customers and executives, using the same

Environment: Hadoop, HDFS, Map Reduce, Sqoop, HBase, Shell Scripting, PIG, HIVE, Oozie, Core Java, Hortonworks Distribution, LINUX

Confidential, Albany, NY

Java/J2EE Developer

Responsibilities:

  • Understanding the business requirements and preparing the design document.
  • Reviewed business requirements and discuss wif application architect about the design.
  • Used Value/Transfer Object and Singleton, Data Access Object, Factory design pattern.
  • Developed Batch process framework using executive service framework to cascade multiple changes on multiple records in only one transaction.
  • Responsible for developing java components using Spring, Spring JDBC, Spring Transaction Management.
  • Created and Implemented Microservices or REST APIS using spring boot, REST, JSON.
  • Used Spring JDBC in persistence layer dat is capable of handling high volume transactions.
  • Implemented the service layer using Spring wif transaction and logging interceptors.
  • Used Spring framework for middle tier and Spring-JDBC templates for data access.
  • Developed SOAP/REST based Web Services using both SOAP/WSDL and REST.
  • Participated in discussion wif business analysts and analyzed the feasibility of the requirements.
  • Drew sequence diagrams and Class diagrams using UML.
  • Created new tables, Sequences and written SQL queries and PL/SQL in Oracle.
  • Developed service layer by using Spring MVC.
  • Developed User interface using JSF, JSP, HTML, JavaScript, and CSS, Ajax. Produced and Consumed Soap web services.
  • Utilized Agile Methodologies to manage full life-cycle development of the project.
  • Implemented MVC design pattern using Spring Framework.
  • Used Maven and configured Jenkins to build and deploy the application.
  • Form classes of Spring Framework to write the routing logic and to call different services.
  • Used Spring DAO to connect wif the database.

Environment: Java JDK 1.7, Oracle 11g, Eclipse, Spring MVC, Web services, Microservices, Agile Methodology, Java/J2EE, SQL, PL/SQL, JSP, IBatis, Apache Tomcat 7, HTML, Java Script, JDBC, XML, XSLT, UML, JUnit, log4j, SVN and Maven.

Confidential

Java Developer

Responsibilities:

  • Designing, developing, testing and implementation of scalable online systems in Java, J2EE, JSP, Servlet's and Oracle Database.
  • Created UML class and sequence diagrams using Rational Rose.
  • Implemented the MVC architecture using Spring Framework.
  • Used JavaScript, HTML for creating interactive front-end screens.
  • Extensively used Custom JSP tags to separate presentation from application logic.
  • Developed JSF custom components and custom tag libraries for implementing the interfaces.
  • Developed Servlets, JSP pages, JavaScript and worked on integration.
  • Involved in developing presentation layer using JSPs and model layer using EJB Session Beans.
  • Co-ordinate wif QA for testing, Production releases, Application deployment, integration and conducting walk-through code reviews.
  • Involved in building and parsing XML documents.
  • Documented the whole source code developed.
  • Involved in writing SQL queries, stored procedure and PL/SQL for back end.
  • Used Views and Functions at the Oracle Database end.
  • Developed Ant build scripts for compiling and building the application.
  • Used Maven as a build tool, wrote the dependencies for the jars dat needs to be migrated.
  • Configured and Deployed application on IBM Web Sphere Application Server
  • Developed JUnit test cases and performed integration and system testing.
  • Coordinated wif other Development teams, System managers and web master and developed good working environment.
  • Giving training to freshers in Core Java.

Environment: Java, J2EE, JSP, MVC, Servlets, spring, XML, HTML, JavaScript, JSON, Oracle, MySQL, JUnit, PLSQL, JDBC, ANT script, Maven, IBM Web Sphere

Confidential

JR Software Engineer

Responsibilities:

  • Work wif team of developers on python applications for RISK management
  • Created SQL queries to pull data from the relational databases
  • Gatheird business requirements and converted it into SQL stored procedures for database specific projects
  • Developed the DAO layer for the application using Spring Hibernate Template support
  • Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
  • Designed user-interface and checking validations using JavaScript.
  • Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
  • Developed various EJBs for handling business logic and data manipulations from database.
  • Involved in design of JSP’s and Servlets for navigation among the modules.
  • Designed cascading style sheets and XML part of Order Entry Module & Product Search Module and did client side validations wif java script.
  • Developed Tableau visualizations and dashboards using Tableau Desktop
  • Designed and developed data management system using MySQL
  • Wrote python scripts to parse XML documents and load the data in database
  • Expertise in writing Constraints, Indexes, Views, Stored Procedures, Cursors, Triggers and User Defined function
  • Created unit test/regression test framework for working/new code
  • Interfaced wif third-party vendors to customize UI/UX solutions
  • Elegantly implemented page designs in standards-compliant dynamic XHTML and CSS

Environment: Python, Django, MySQL, Linux, HTML, XHTML, SVN, CSS, AJAX, Bugzilla, JavaScript, Apache Web Server

We'd love your feedback!