- 9 + years of experience on Big Data Hadoop and Java J2EE Solutions Analysis, Design, Development, Testing and Deployment.
- Expertise on Big - Data Hadoop stack using Cloudera CDH4, CDH5, EMR, Hortonworks HDP2.2, HDP2.3, MpaR Data Platforms.
- Experience on Amazon cloud components AWS - EC2, EMR, S3, RDS, Redshift, Data pipeline .
- Capable of processing large sets of structured, semi-structured and un-structured data and supporting systems applications architecture.
- Superior background in object oriented development including PERL, C++, Java, Scala and shell scripting.
- Expertise with data architecture including data ingestion, data cleansing, data modeling, transfromations, data mining and advanced data analytics .
- Extensive experience in using eco systems components Map Reduce, Hive, HQL, Pig, Sqoop, Oozie, Flume, Mahout, R, Cassendra, DSE, HBase and Apche Spark .
- In-depth understanding of MR and YARN .
- Experience in writing Oozie workflows to run multiple jobs using Hive, Shell, Java, Oozie actions.
- Experience on the SparkCore, SparkSQL, Spark Streaming MLib, Graphx .
- Expertise on creating RDD’s, Pair RDD’s, Transformations and Actions in Spark.
- Experience on creating Data Frames and DStreams for Relational data processing in SparkSQL .
- Experience on integrating Kafka and Spark for Real Time processing.
- Worked on Kafka Mirror Maker setup, data replication across data centers.
- Experience with Kafka Topics, Replication and Partitions creation.
- Experience in installing, configuring and monitoring Hadoop clusters.
- Responsible to develop LINUX shell scripts for automating and testing jobs.
- Vast experience on developing Java based web, products and enterprise applications.
- Experience as ETL developer using Talend Studio for Big Data .
- In-depth understanding in Talend components for HDFS, S3, HIVE and Java .
- Able to assess business rules, collaborate with stakeholders and perform design and reviews .
- Expertise in Functional and Technical designs development and implementations.
- In-depth understanding of Hadoop infrastructure and Design Patterns .
- Experience in both Waterfall and Agile development methodology.
- Extensive experience in reviewing and analysisng requirements documents, understanding business process flow diagrams and use cases .
- Experience with Hadoop cluster Installation, Configuration, Administration and cluster management with Ambari .
- Experience in creation of reusable components using multiple technologies.
- Good understanding of ORC, RC, Sequence File, AVRO and Parquet file formats.
- Experience on SQL and No-SQL, Realational, Graph database systems.
- Expertise on Spring MVC Controllers development for data processing.
- Experience on Core Java, Collections, Servlet’s, JSP’s, IoC, DI, Portlet’s, Liferay, Hibernate, JDBC.
- Experience on User Stories creation and tracking in Rally, VersionOne for Agile methodology.
- Experience on domains Retail, Finance, CME and Telecom .
- Excellent communication and presentation skills
Hadoop Skills: MapReduce, HDFS, Hive, Pig, Sqoop, R, Cassendra, DSE, Spark, Kafka, HBase, Oozie.
Apache Spark: Spark Core, Spark SQL, Spark Streaming, MLib, Graphx.
AWS Cloud: Amazon EMR, S3, Redshift, EC2, RDS Postgre, Data Pipeline.
Hadoop Distributions: Hortonworks, Pivotal, EMR, MapR, Cloudera.
Languages: Java, J2EE, C, C++. Scala
Frame Works: Spring, Hibernate, Struts, Versata.
ETL: Talend ETL, Talend Studio, Talend for Big Data.
Databases: Oracle, MySQL, Teradata, DB2, Hive, AWS RDS.
Tools: Rally, VersionOne, GitHUB, Eclipse, Clear Case, Clear Quest, Jenkins.
Methodologies: Agile, Waterfall, UML, Design Patterns.
Platforms: Windows XP, 7, 8.1, Linux Centos, RHEL.
Servers: WebSphere, Darwin, Helix, Tomcat, Darwin.
Confidential, Denver, CO
Sr. Hadoop Developer
- Responsible for migrating data from diversified data sources into Hadoop XNet platform.
- Responsible for creating low level designs for code implementations.
- Created Sqoop jobs for both Historical and Incremental data migration from legacy systems .
- Developed re-usable SFTP utility for flat files migration into XNet platform.
- Developed Kafka producer and consumer components for real time data processing.
- Created Kafka Topics, Partitions with replication factors across data centers.
- Implemented Kafka Mirror Maker for data replication across the clusters.
- Experience on Kafka and Spark integration for real time data processing.
- Responsible for creating Event Handler Classes for events processing.
- Created Spark SQL jobs with various tranformations and actions for data processing.
- Created Spark jobs with RDD’s, Pair RDD’s, Transformations and Actions, Data Frames for data transformations from relational stores.
- Responsible for creating Hive tables using Partitions, Buckets, UDF’s, HQL Scripts in landing layer for analytics.
- Developed folder watcher utility for continuous data migration to HDFS.
- Developed Oozie workflows for running sequential job flows in Production.
- Created mapping documents from legacy systems to Hadoop .
- Developed file parsers utilities for parsing files as per the requirement.
- Responsible for user stories creations, tracking and delivering as per sprint planning.
Environment: Hortonworks HDP-2.3 YARN cluster, HDFS, Hive, Spark SQL, Scala, Spark Streaming, HBase, Kafka, Sqoop, Oozie, Control-M and Cassendra, DSE, Orient DB.
Confidential, Minneapolis, MN
Sr. Hadoop Developer
- Facilitated insightful daily analysis of both historical and incremental data sets for 100’s of TB’s.
- Developed MapReduce programs to convert data from filesystems to canonical json files.
- Created Hive external and internal tables to load data from Landing to Foundation.
- Developed shell scripts to perform data loads in automated way and perform analysis.
- Created Hive queries that helped data analysis on customer purchase trends by comparing fresh data with EDW reference data and historical metrics.
- Generating Scala and Java classes from the respective APIs so that they can be incorporated in the overall application
- Created partitioned and bucketed tables to organize data and achieved better performance.
- Responsible for creating Talend workflows to migrate data from Amazon S3 to Hadoop platform.
- Developed Talend ETL for data transformations, joins, filters, mapping, aggregations before sotring data on Hadoop and exporting data to relational systems.
- Created Talend workflows for data enrichment and data cleansing using multiple components.
- Developed custom java requirements in Talend using tJava component.
- Responsible for data exports to MySQL and Redshift using Talend studio.
- Implemented RC, ORC, Sequence and AVRO files formats in Hive to achieve better performance.
- Implemented data compression technics to save space for historical data tables.
- Developed Sqoop historical jobs for data migration from Teradata and DB2 to Hadoop platform.
- Designed and developed Oozie workflows for Hadoop jobs execution in sequence.
- Implemented Shell, Sqoop, Hive and Java actions in Oozie for running multiple jobs in cluster.
- Implemented Oozie forking mechanism for achieving parallelism in Hadoop.
- Developed re-usable SFTP framework for data migration from external systems.
- Created HBase tables for handling updates in Hadoop with data loads.
- Participated in converting existing Hadoop jobs to spark jobs using SparkCore, Spark SQL.
- Responsible for landing and exporting multi source data to HDFS using Talend.
- Writing scala classes to interact with the database.
Environment: Hortonworks HDP-2.2 YARN cluster, Talend Studio, HDFS, Map Reduce, Apache Hive, Apache Pig, Apache Spark, Sqoop, Oozie, EDW, ADW, Control-M, SFTP, HBase.
- Developed of data migration workflows from Amazon S3 to HDFS using Talend studio.
- Developed data cleansing and transformation workflows using Talend.
- Written pig scripts for data processing and KPI implementations in Hadoop.
- Implemented UDF’s for KPI’s and embedded in Pig scripts.
- Written Talend workflows for data exports into RDBMS systems and Cloud data bases.
- Developed MapReduce programs for converting un-structured data into structured data.
- Performed code reviews, migrations from lower to higher environments.
- Developed Oozie jobs for executing sequence of job flows.
Environment: Cloudera’s CDH 5.5 Hadoop cluster, EC2, AWS-S3, RDS, Talend studio, HDFS, Map Reduce, Apache Hive, Apache Pig, UDF’s, Oozie, C#, Geo-click, QlikView.
- Performed requirements analysis as per the client requirements.
- Developed technical design documents as per functional design documents.
- Coded Java programs for application pages and modules development.
- Coded Java programs for creation for financial documents.
- Developed reporting module for Advantage.
- Responsible for defects tracking and defect fixing in application.
- Integrated Adobe reader for PDF reporting module.
- Developed validation frame work for Admin module for entire application.
- Designed and developed visualization module for application as per client needs.
Environment: Java, WebSphere, Versata, Clear Case, Clear Quest, HTML..
- Performed requirements analysis as per the client requirements.
- Developed code as per technical design documents.
- Developed functionality for Video streaming using MP4IP tool.
- Performed analysis on Darwin server and done installation and configuration.
- Developed code to embide Quicktime streaming plugin for video streaming.
Environment: Java, FFMPEG, MP4IP, Darwin, Spring MVC, JSP, Tomcat, Clear Case, Clear Quest.
Associate Java Developer
- Executed required R&D as per project requirements.
- Performed requirement analysis as per client needs.
- Developed code to embide QR codes in the application.
- Developed Click to Call and Click to Connect functionality in the application..
Environment: Java, J2EE, Spring MVC, JSP, Hibernate, Tomcat, Clear Case, Clear Quest.