- 6+ years of experience as Hadoop Developer with ETL & BIG Data Framework.
- Well versed in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Solid understanding of Software Development Life Cycle (SDLC) and experience working under the Agile V - model/Scrum framework.
- Working knowledge of Git, CI/CD.
- Experienced in major Hadoop ecosystem tools such as Hive, HBase, Pig, Spark and monitored them with Cloudera Manager.
- Proficiency in Linux programming environment, command line interface, Bash commands and Bash shell scripting.
- Extensive experience using Hive Query Language for data analytics.
- Hands-on experience working on NoSQL databases including HBase, MongoDB and its integration with the Hadoop cluster.
- Experience in implementing Spark, Scala application using higher-order functions for both batch and interactive analysis requirement.
- Good knowledge in using job scheduling and monitoring tools like Oozie.
- Extensive experience in loading and analyzing large datasets with the Hadoop framework MapReduce, HDFS, HIVE, Sqoop, SPARK, NoSQL databases like MongoDB and HBase.
- Good experience with design, coding, debug operations, reporting and data analysis utilizing Scala and using Scala libraries to speed up development.
- Worked on JIRA / CONFLUENCE for project development.
- Working knowledge of database such as Oracle10g/11g/12c, Microsoft SQL Server, DB2, MYSQL.
- Strong critical thinking, decision making, analytical & problem-solving skills.
- Excellent verbal/written communication skills, including communicating technical issues to non-technical audiences.
- Experienced in working with large collaborative and Cross -platform teams.
- Intellectually curious, have a solutions-oriented attitude and enjoy learning new tools and techniques.
Linux V.18.04 (Ubuntu), Cloudera V.5.13.0, Apache Hadoop V.2.6, Eclipse V.4.7, IntelliJ, MySQL V.14.14, Hive V.1.1.0, Scala V.2.10.5 & 2.11.12, Sqoop V.1.4.6, Spark (SQL & Streaming) V.1.6.0 & V.2.4.6, PigV.0.12.0, HBase V.1.2.0, Version Control System (Git), Jenkins, Yarn, JIRA, HDFS, MongoDB, & Oozie V.4.1.0 Intermediate Knowledge of Kafka, AWS, S3, RedShift, EMR, EC2, & Kinesis.
Confidential, New York, NY
- Worked directly with the Big Data Architecture Team which created the foundation for Enterprise Analytics initiative in a Hadoop-based Data Lake.
- Processed data in Spark in the form of Data Frame and save the data as Parquet format into HDFS.
- Developed data pipeline using Sqoop, and Java map reduce to ingest wealth management client’s data and financial histories into HDFS for analysis.
- Executed Spark performance and optimization techniques which improved the existing algorithms and avoided high latency in Hadoop using Spark Context, Spark-SQL, Data Frame, Data Set.
- Involved in migration of ETL processes from Oracle to Hive to test the easy data manipulation and worked on importing and exporting data from Oracle and MYSQL into HDFS and HIVE using Sqoop.
- Created Hive external tables and loaded the data into tables and query data using HQL and worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Developed Spark jobs using Scala on top of Yarn/MRv2 for interactive and Batch Analysis.
- Integrated Hive and HBase to load the results of Hive Query analysis into HBase.
- Transformed different format files such as text, Parquet, json, ORC, Avro for better read and write.
- Performed optimization techniques in hive such as partitioning, bucketing and indexing.
- Worked with complex HQL queries and performed joins for analysis.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, RDS.
- Optimized spark jobs for better reliability and performance by achieving low latency.
- Worked with both onshore and offshore teams and also had the responsibility of updating the management about the day to day status.
- Attended daily Scrum meetings to assess the progress and review the accomplishments, targets and understand the issues.
Confidential, Columbus, OH
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Sqoop.
- Collaborated in identifying the current problems, constraints, and root causes with data sets to identify the descriptive and predictive solution with support of the Hadoop HDFS, MapReduce, Pig, Hive, and HBase and further to develop reports in Tableau.
- Installed and Configured Sqoop to import and export the data into Map-FS, HBase and Hive from RDBMS.
- Worked with application team to design and develop an effective Hadoop solution.
- Developed and tested workflow scheduler job scripts in Apache Oozie.
- Developed data pipeline using, Sqoop, Pig and Java MapReduce to ingest customer behavioral data into HDFS for analysis.
- Developed ELT processes from various data repositories and APIs across the enterprise, ensuring data quality and process efficiency
- Worked with team on creating the Data Model for HBase from the current Oracle Data model.
- Used Hive and created Hive tables and involved in loading results to HBase.
- Worked closely with data analysts to construct creative solutions for their analysis tasks and managed and reviewed Hadoop and HBase log files.
- Exported the analyzed data to the RDBMS using Sqoop for visualization and generated reports.
- Worked on importing and exporting data from Oracle into HDFS and HIVE using Sqoop.
- Collaborated with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
- Automated workflows using shell scripts to pull data from various databases into Hadoop.