- Overall 8 years of total IT experience in all phases of software development life cycle, 5 years of experience in Hadoop and Big Data Eco System.
- Great Experience and knowledge in Hadoop architecture and various components such as HDFS, YARN, Job tracker, Task Tracker, Name Node, Data Node and MapReduce.
- Good experience in Hadoop ecosystem like Hadoop MapReduce, HDFS, NIFI, Oozie, Hive, Sqoop, Pig, Zookeeper, Flume, Spark streaming, Spark SQL, HBase and Cassandra.
- Expertise in Hadoop 2.0 and YARN architecture.
- Experience in using Hadoop cluster using Cloudera’s CDH, Horton works HDP.
- Expertise in writing Hadoop Jobs for analyzing data using MapReduce, Hive and Pig.
- Experience in importing and exporting data using Sqoop from HDFS to Relational DatabaseSystems (RDBMS) and vice - versa.
- Developed and implemented Apache NIFI across various environments, written QA scripts in Python for tracking files.
- Expertise in writing custom UDF’s and UDAF’s for extending Hive and Pig core functionalities.
- Experience in implementation of various Hadoop file-formats and compression techniques like Sequence, Parquet, ORC, Avro, Z-Zip and Text file.
- Experienced in using NoSQL data bases like HBase, Cassandra, MongoDB.
- Experience in working with different Databases like Oracle, MySQL, MS SQL.
- Experience in writing UNIX, SHELL and BASH scripts.
- Good experience in implementing advanced procedures like text analytics and processing the in-memory computing capabilities with Apache Impala, Scala.
- Experience in creating RDD, Data frames for the required data and did transformations using Spark RDD’s, Spark SQL
- Used Spark Structured Streaming to perform necessary transformations.
- Experience in Writing Producers/Consumers and creating messaging centric applications using Apache Kafka.
- Hands on experience in Amazon Web Services (AWS) provisioning tools likeEC2, Simple Storage Service (S3), Elastic Map Reduce.
- Extensive Experience in Java development skills using J2SE, J2EE technologies like Servlets, Spring Hibernate, JSP, JDBC.
- Experienced in Java components like Frame work collection, Exception handling, Multithreading and I/O system.
- Experience in SOA using Soap and Restful.
- Experience in working with Waterfall & Agile development methodology.
- Proficiency in developing secure enterprise Java applications using technologies such as X-Servlets, Maven, Hibernate, XML, HTML, CSS Version Control Systems.
- Ability to learn and adapt quickly to new tools and environment with strong communication and analytical skills.
Big Data Eco Systems: Hadoop (HDFS & Map Reduce), PIG, HIVE, HBASE, Zoo Keeper, Sqoop, Flume, Kafka, Apache Spark, Impala, Oozie.
Databases: Oracle, SQL server, My SQL.
No SQL Databases: HBase, Cassandra, Mongo DB.
Hadoop Distributions: Cloudera, Horton works.
Cloud: AWS, AZURE.
Languages: Java, Java SE, Java J2EE, Scala, Python, C.
Web Services: REST, SOAP, JAX-WS, JAX-RPC, JAX-RS, WSDL, Axis2, Apache HTTP, CVS, SVN.
IDE: Eclipse, Net beans, IntelliJ.
Operating Systems: MacOS, Linux, Windows.
HADOOP/ SPARK DEVELOPER
Confidential, Cleveland, OH
- Responsible for building scalable distributed data solutions using Hadoop.
- ETL - Data cleansing, Transformation and prepping data ready for reporting tools.
- Developed Spark jobs and Hive Jobs to apply rules, logics and transform data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark data frames, Scala.
- Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
- Used Spark Structured Streaming to perform transformations in data lake which gets data from Kafka and send to HDFS.
- Created a Spark Streaming task to import live data from Kafka sources and implemented analysis models.
- Responsible for handling large datasets using repartition, coalesce,broadcast variables and spark’s in-memory capabilities.
- Converted row-like regular hive external tables into columnar snappy compressed parquet tables with key-value pairs. Also worked on other file formats like CSV and Text formats.
- Implemented Hashing algorithms like UUID, MD5 for checksum and identifying delta.
- Applied transformations on data ingested by Informatica team as per business requirements.
- Used JDBC connectors to access reference tables and lookup-tables from Oracle RDBMS Tables.
- Written Ad-hoc queries in hive for orchestration and unit testing.
- Created and scheduled Control-M jobs to run multiple Hive and Spark Jobs, which independently run with time and data availability.
- Implemented the work flows using Apache Oozie frame work to automate tasks.
- Built on-premise end-to-end data pipelines.
- Assisted in setting up Amazon EMR cluster, adding roles in Amazon IAM for Disaster Recovery (DR) Cluster.
- Created business ready Views on top of Master Table and replicated data into Amazon S3.
- Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
- Used JIRA for task/Defect tracking, SVN for version control.
Environment: Hadoop, Cloudera, HDFS, Hive, Oozie, SparkSQL, Sqoop, Control-M, Scala, Informatica, Tableau, Shell Scripting, Python, Oracle, AWS.
Confidential, Minneapolis, MN
- Interacted with the Business users to identify the process metrics and various key dimensions and measures and involved in the complete life cycle of the project.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Developed Map Reduce jobs in Java for data cleaning and preprocessing.
- Good knowledge in using Apache NIFI to automate the data movement.
- Used Map Reduce to ingest customer behavioral data and financial histories into HDFS.
- Used Pig as ETL tool for transforming and pre-aggregations before storing data into HDFS.
- Responsible for defining the data flow within Hadoop eco system and direct the team in implement them and exported the result set from Hive to MySQL using Shell scripts.
- Handled importing of data from various data sources, performed transformations.
- Involved in creating tables, partitioning, bucketing of table.
- Configured Flume agents on different data sources to capture the streaming log data.
- Implemented usage of Amazon EMR for processing Big Data across Hadoop cluster in virtual servers in EC2 and S3.
- Experience with different data formats like Avro, Parquet, ORC and compressions like Snappy and Z-zip.
- Implemented POC in persisting click stream data with Apache Kafka.
- Optimized existing algorithms in Hadoop using Spark SQL.
- Troubleshooting and solving migration issues and production issues.
Environment: Hadoop, HDFS, Hive, Sqoop, Java, Spark, AWS, Horton works, Kafka, Cassandra, UNIX, Tableau.
Confidential, New York City, NY
- Importing data using Sqoop into HDFS vice versa.
- Worked on loading and transformation of large sets of structured, semi structured and unstructured data into Hadoop System.
- Responsible to manage data coming from different data sources.
- Developed simple and complex MapReduce programs in Java for Data Analysis.
- Load data from various data sources into HDFS using Flume.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Responsible for spooling data from DB2 sources to HDFS using Sqoop.
- Created HIVE tables and provided analytical queries for business user analysis
- Extensive knowledge on PIG scripts using bags and tuples.
- Created tables in HIVE by partitioning and bucketing for granularity and optimization of HIVEQL.
- Involved in identifying job dependencies to design workflow for Oozie and resource management for YARN.
- Capturing data from existing databases that provide SQL interfaces using Sqoop.
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Pig, Hive and written Pig and Hive UDFs.
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map way.
Environment: Cloudera, HBase, Java, Hive, Pig, Sqoop, Oozie, Oracle, SVN, Kafka, GitHub, JIRA, Talend.
- Extensively involved in different stages of Agile Development Cycle including Detailed Analysis, Design, Develop and Test.
- Implemented the Back-End Business Logic using Core Java technologies including Collections, Generics, Exception Handling, Java Reflection and Java I/O.
- Wrote and specified Spring Annotation Configuration to define Beans and View Resolutions to configure Spring beans, dependencies and the services needed by beans.
- Used Spring IC to implement dynamic dependency injection and Spring AOP to implement crosscutting concerns such as transaction management.
- Wrote Mapping Configuration files to implement ORM Mappings in the Persistence Layer.
- Using Hibernate DAO support extended Dao Implementation.
- Hibernate Configuration files were written to connect Oracle database and fetch data.
- The Hibernate Query Cache was implemented using EhCache to improve the performance.
- Implemented web services with RESTful standards with the support of JAX-RS APIs.
- Confirmation of registration and monthly statements are sent to users by integrating and implementing JavaMail API.
- Manipulated database data with SQL queries, including setting up stored procedures and triggers.
- Utilized Node.js and MongoDB to generate tendency charts of the application for Payment History.
- Performed JUnit test cases to test the service layers of the application.
- Used JIRA to track the projects and GIT to ensure version control.