Hadoop Developer Resume MD - Hire IT People

PROFESSIONAL SUMMARY

9 years of professional IT industry experience encompassing wide range of skill set in Big Data technologies and Java/J2EE technologies.
4 years of experience in working wif Big Data Technologies on systems which comprises of massive amount of data running in highly distributive mode in Cloudera, Hortonworks Hadoop distributions.
Hands on experience in using Hadoop ecosystem components like Hadoop, Hive, Pig, Sqoop, HBase, Cassandra, Spark, Spark Streaming, Spark SQL, Oozie, ZooKeeper, Kafka, Flume, MapReduce and Yarn.
Excellent Core Java development skills and familiarity wif coding business components using various API's of Java like Multithreading, Collections.
Accomplished complex HiveQL queries for required data extraction from Hive tables and written Hive UDF's as required.
Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
Experienced working wif Spark Streaming, Spark SQL and Kafka for real - time data processing.
Strong experience troubleshooting Spark applications and various performance considerations to take for efficient memory handling.
Worked on HBase to load and retrieve data for real time processing using Rest API.
Worked on developing ETL Workflows on the data obtained using Python for processing it in HDFS and HBase using Oozie.
Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
Worked in developing a Nifi flow prototype for data ingestion in HDFS.
Worked on setting up Apache NiFi and performing POC wif NiFi in orchestrating a data pipeline.
Good experience on general data analytics on distributed computing cluster like Hadoop using Apache Spark, Impala, and Scala.
Strong experience in analyzing large amounts of data sets writingPySparkscripts and Hive queries.
Extensive experience in working wif various distributions of Hadoop Enterprise versions of Cloudera(CDH5), Hortonworks and good knowledge on Amazon's EMR (Elastic MapReduce).
Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
Developed various Shell Scripts and python scripts to automate Spark jobs and hive scripts.
Experienced in using Pig scripts to do transformations, event joins filters and some pre-aggregations before storing the data onto HDFS.
Experience wif Maven, ANT for continuous integration and builds.
Experience in implementing MVC frameworks like JSF, Spring MVC and ORM tools like Hibernate in J2EE architecture.
Developed Complex database objects like Stored Procedures, Functions, Packages and Triggers using SQL and PL/SQL.

TECHNICAL SKILLSET

Big Data: HDFS, MapReduce, Hive, Pig, ZooKeeper, Apache Spark, Nifi Core, MlLib, Hortonworks, Spark SQL and Dataframes

Utilities: Sqoop, Flume, Kafka, Oozie and AutoSys

No SQL Databases: Hbase,Cassandra

Languages: C, C++, Java, Python, J2EE, PL/SQL, MR, Pig Latin, HiveQL, Unix shell scripting and Scala

Operating Systems: Sun Solaris, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Databases and Datawarehousing: Teradata, DB2, Oracle 9i/10g/11g, SQL Server, MySQL

Tools and IDE: Maven, Toad, Eclipse, NetBeans, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer

PROFESSIONAL EXPERIENCE

Confidential, MD

Hadoop Developer

Responsibilities:

Working knowledge of Spark RDD, Data Frame API, Data set API, Data Source API, Spark SQL and Spark Streaming.
Exploring wif the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Performed Spark join optimizations, troubleshooted, monitored and wrote efficient codes using Scala.
Experienced wif batch processing of data sources using Apache Spark and Elastic search.
Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
Experienced to implement Hortonworks distribution system.
Creating Hive tables and working on them for data analysis to cope up wif the requirements.
Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to HIVE tables.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
Loaded all data-sets into Hive from Source CSV files using spark and Cassandra from Source CSV files using Spark/PySpark.
Worked on Spark using Scala and Spark SQL for faster testing and processing of data.
Migrated the computational code in hql toPySpark.
Completed data extraction, aggregation and analysis in HDFS by usingPySparkand store the data needed to Hive.
Participated in development/implementation of Cloudera Hadoop environment.
Developed Python code to gather the data from HBase (Cornerstone) and designs the solution to implement usingPySpark.
Experienced in working wif Elastic MapReduce(EMR).
Developed Map Reduce programs for some refined queries on big data.
In-depth understanding of classic MapReduce and YARN architecture.
Worked wif business team in creating Hive queried for ad hoc access.
Use Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Implemented Hive Generic UDF's to implement business logic.
Analyzed the data by performing Hive queries, ran Pig scripts, Spark SQL and Spark Streaming.
Installed and configured Pig for ETL jobs.
Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
Involved in creating generic Sqoop import script for loading data into Hive tables from RDBMS.
Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and tan exported the transformed data to Cassandra as per the business requirement.
Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and Pig to pre-process the data.
Created detailed AWS Security groups which behaved as virtual firewalls dat controlled the traffic allowed reaching one or more AWS EC2 instances.
Involved in creating data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers.
Performed data integration wif a goal of moving more data effectively, efficiently and wif high performance to assist in business-critical projects using Talend Data Integration.
Design, developed, unit test, and support ETL mapping and scripts for data marts using Talend.
Used HUE for running Hive queries. Created partitions according to day using Hive to improve performance.
Built a data flow pipeline using flume, Java (MapReduce) and Pig.
Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
Experience in using version control tools like GITHUB to share the code snippet among the team members.
Analyzed HBase data in Hive by creating external partitioned and bucketed tables
Perform POC on single member debug on Spark and Hive

Environment: Hadoop 2x, Apache Spark, Spark-SQL, Dataframes, Scala, HDFS, HIVE, Oozie, Kafka, Autosys, Oracle, Teradata, Python/PySpark, Map Reduce, Sqoop, HBase, Shell Scripting, PIG, Core Java, Cassandra, Nifi, Talend Big Data Integration, Cloudera Hadoop Distribution, PL/SQL, Toad, Windows NT, LINUX

Confidential, NJ

Hadoop Developer

Responsibilities:

Worked on migrating MapReduce programs into Spark transformations using Scala. Responsible for building scalable distributed data solutions using Hadoop.
Responsible for building scalable distributed data solutions using Hadoop.
Responsible for cluster maintenance by adding and removing cluster nodes. Cluster monitoring, troubleshooting, managing and reviewing data backups and log files.
Wrote complex MapReduce jobs in Java to perform operations by extracting, transforming and aggregating to process terabytes of data.
Collected and aggregated large amounts of stream data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
Analyzed data using Hadoop components Hive and Pig.
Scripted complex HiveQL queries on Hive tables to analyze large datasets and wrote complex Hive UDFs to work wif sequence files.
Worked on setting up Apache NiFi and performed POC using NiFi in orchestrating data flows.
Migrated the computational code in hql toPySpark.
Scheduled workflows using Oozie to automate multiple Hive and Pig jobs, which run independently wif time and data availability.
Responsible for creating Hive tables, loading data and writing Hive queries to analyze data.
Generated reports using QlikView.
Worked wif Senior Engineer on configuring Kafka for streaming data
Developed Spark programs using Scala API’s to compare the performance of Spark wif Hive and SQL
Experience in installation, configuration, supporting and managing Hadoop Clusters using Horton works, and Cloudera (CDH3, CDH4) distributions on Amazon web services (AWS).
Developed and ConfiguredKafka brokersto pipeline server logs data into spark streaming
Wrote several Hive queries to get valuable information from the hidden large datasets.
Loaded and transformed large sets of structured, semi-structured and unstructured data using Hadoop/Big Data concepts.
Built Spark Scripts by utilizing Scala shell commands depending on the requirement.
Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
Collect all the logs from source systems into HDFS using Kafka and perform analytics on it.
Imported data from Teradatadatabase into HDFS and exported the analyzed patterns data back to Teradata using Sqoop.
Worked on Spark using Scala and Spark SQL for faster testing and processing of data.
Worked wif Informatica to perform ETL jobs.
Performed POCs on Spark test environment
Written ETL jobs to visualize the data and generate reports from MySQL database using DataStage

Environment: Hadoop, HDFS, Hive, Pig, Flume, Python, Kafka, Hbase, Scala, Sqoop, Oozie, DataStage, Linux, Hortonworks Distribution, Relational Databases

Confidential, Quincy, MA

Big Data Engineer

Responsibilities:

Extracted data from relational databases such as SQL Server and MySql by developing Scala and SQL code
Uploaded it to Hive and combined new tables wif existingdatabases
Developed code to pre-process large sets of various types of file formats such as Text, Avro, Sequence files, XML, JSON and Parquet
Configured big data workflows to run on the top of Hadoop which comprises of heterogeneous jobs like Pig, Hive, Sqoop and MapReduce
Loaded various formats of structured and unstructured data from Linux file system to HDFS
Used Combiners and Partitioners in MapReduce programming
Written Pig Scripts to ETL the data into NOSQL database for faster analysis
Read from Flume and involved in pushing batches of data to HDFS and HBase for real time processing of the files
Parsing XML data into structured format and loading into HDFS
Scheduled various ETL process and Hive scripts by developing Oozie workflow
Utilized Tableau to visualize the analyzed data and performed report design and delivery
Created POC for Flume implementation
Involved in reviewing both functional and non-functional aspects of the business model
Championed to communicate and present the models to business customers and executives, using the same

Environment: Hadoop, HDFS, Map Reduce, Sqoop, HBase, Shell Scripting, PIG, HIVE, Oozie, Core Java, Hortonworks Distribution, LINUX

Confidential, Albany, NY

Java/J2EE Developer

Responsibilities:

Understanding the business requirements and preparing the design document.
Reviewed business requirements and discuss wif application architect about the design.
Used Value/Transfer Object and Singleton, Data Access Object, Factory design pattern.
Developed Batch process framework using executive service framework to cascade multiple changes on multiple records in only one transaction.
Responsible for developing java components using Spring, Spring JDBC, Spring Transaction Management.
Created and Implemented Microservices or REST APIS using spring boot, REST, JSON.
Used Spring JDBC in persistence layer dat is capable of handling high volume transactions.
Implemented the service layer using Spring wif transaction and logging interceptors.
Used Spring framework for middle tier and Spring-JDBC templates for data access.
Developed SOAP/REST based Web Services using both SOAP/WSDL and REST.
Participated in discussion wif business analysts and analyzed the feasibility of the requirements.
Drew sequence diagrams and Class diagrams using UML.
Created new tables, Sequences and written SQL queries and PL/SQL in Oracle.
Developed service layer by using Spring MVC.
Developed User interface using JSF, JSP, HTML, JavaScript, and CSS, Ajax. Produced and Consumed Soap web services.
Utilized Agile Methodologies to manage full life-cycle development of the project.
Implemented MVC design pattern using Spring Framework.
Used Maven and configured Jenkins to build and deploy the application.
Form classes of Spring Framework to write the routing logic and to call different services.
Used Spring DAO to connect wif the database.

Environment: Java JDK 1.7, Oracle 11g, Eclipse, Spring MVC, Web services, Microservices, Agile Methodology, Java/J2EE, SQL, PL/SQL, JSP, IBatis, Apache Tomcat 7, HTML, Java Script, JDBC, XML, XSLT, UML, JUnit, log4j, SVN and Maven.

Confidential

Java Developer

Responsibilities:

Designing, developing, testing and implementation of scalable online systems in Java, J2EE, JSP, Servlet's and Oracle Database.
Created UML class and sequence diagrams using Rational Rose.
Implemented the MVC architecture using Spring Framework.
Used JavaScript, HTML for creating interactive front-end screens.
Extensively used Custom JSP tags to separate presentation from application logic.
Developed JSF custom components and custom tag libraries for implementing the interfaces.
Developed Servlets, JSP pages, JavaScript and worked on integration.
Involved in developing presentation layer using JSPs and model layer using EJB Session Beans.
Co-ordinate wif QA for testing, Production releases, Application deployment, integration and conducting walk-through code reviews.
Involved in building and parsing XML documents.
Documented the whole source code developed.
Involved in writing SQL queries, stored procedure and PL/SQL for back end.
Used Views and Functions at the Oracle Database end.
Developed Ant build scripts for compiling and building the application.
Used Maven as a build tool, wrote the dependencies for the jars dat needs to be migrated.
Configured and Deployed application on IBM Web Sphere Application Server
Developed JUnit test cases and performed integration and system testing.
Coordinated wif other Development teams, System managers and web master and developed good working environment.
Giving training to freshers in Core Java.

Environment: Java, J2EE, JSP, MVC, Servlets, spring, XML, HTML, JavaScript, JSON, Oracle, MySQL, JUnit, PLSQL, JDBC, ANT script, Maven, IBM Web Sphere

Confidential

JR Software Engineer

Responsibilities:

Work wif team of developers on python applications for RISK management
Created SQL queries to pull data from the relational databases
Gatheird business requirements and converted it into SQL stored procedures for database specific projects
Developed the DAO layer for the application using Spring Hibernate Template support
Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
Designed user-interface and checking validations using JavaScript.
Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
Developed various EJBs for handling business logic and data manipulations from database.
Involved in design of JSP’s and Servlets for navigation among the modules.
Designed cascading style sheets and XML part of Order Entry Module & Product Search Module and did client side validations wif java script.
Developed Tableau visualizations and dashboards using Tableau Desktop
Designed and developed data management system using MySQL
Wrote python scripts to parse XML documents and load the data in database
Expertise in writing Constraints, Indexes, Views, Stored Procedures, Cursors, Triggers and User Defined function
Created unit test/regression test framework for working/new code
Interfaced wif third-party vendors to customize UI/UX solutions
Elegantly implemented page designs in standards-compliant dynamic XHTML and CSS

Environment: Python, Django, MySQL, Linux, HTML, XHTML, SVN, CSS, AJAX, Bugzilla, JavaScript, Apache Web Server

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

MD

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship