- Overall 7 years of Experience in the field of Java and Data Engineering using Hadoop, HDFS, MR2, YARN, Kafka, PIG, Hive, Sqoop, HBase, Cloudera Manager, Zoo keeper, Oozie, CDH5, AWS, Spark, Scala, Java Development and Software Development Life Cycle (SDLC)
- Strong working Knowledge in Agile Methodologies, Scrum stories and Sprints experience in Python environment, along with Data Analytics, and Excel data extracts.
- Sound knowledge in Big data, Hadoop, NoSQL and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce2, YARN programming paradigm.
- Experienced on working with job/workflow scheduling and monitoring tools like Oozie.
- Hands on Experience with reporting tools like Tableau.
- Knowledge of distributed systems, HDFS architecture, anatomy of MapReduce and Spark processing frameworks. Worked on debugging and performance tuning of Hive Jobs.
- Implemented Sqoop Queries for data import into Hadoop from MySQL.
- Working knowledge of NoSQL databases such as HBase, Cassandra.
- Working Knowledge on Scala Programming.
- Proficient in applying performance tuning concepts to SQL Queries, Informatica Mappings, Session and workflow properties, and database.
- Implemented Java tools in business, Web, and client - server environments including Java Platform, J2EE, EJB, JSP, Servlets, Struts, Spring, JDBC.
- Experience in data cleansing, extracting, pre-processing, transformation and data mining.
- Around 3 years of experience in advanced statistical techniques including predictive statistical models, segmentation analysis, customer profiling, survey design and analysis, and data mining tools like supervised, unsupervised learning models.
- Dynamic personality with problem-solving, analytical, communication and interpersonal skills.
- Expertise in Software Development Life Cycle (SDLC) and Software Testing Life Cycle (STLC).
Programming Languages: C, C++, Java, Python, SQL, Kafka
Web Technologies: HTML, XML, JSP, JSF
Hadoop Ecosystem: YARN, MR2, Sqoop, Hive, Pig, Flume, Oozie, Spark
Hadoop Distribution: Hortonworks, Cloudera, Docker
Databases: MySQL, Teradata, RDBMS
No SQL Databases: MongoDB, Cassandra, HBase
Reporting Tools: Tableau, Power BI
Operating Systems: Unix, Linux, Windows
Cloud based Databases: EC2, S3, EBS, RDS and VPC
Confidential, Auburn, Michigan
Sr. Hadoop/Spark Developer
- Worked on the Hadoop Ecosystem with tools like HBase and Sqoop.
- Responsible for building applications utilizing Hadoop.
- Involved in stacking information from LINUX record framework to HDFS.
- Worked on recovery, scope quantification.
- Created HBase tables to store variable organizations information originating from various portfolios.
- Implemented test scripts to help test driven improvement and consistent coordination.
- Uploaded and processed terabytes of data from various structured and unstructured sources into HDFS (AWS cloud) using Sqoop and Flume.
- Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket
- Played a key role in configuration of the various Hadoop ecosystem tools such as Kafka, Pig, HBase.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Ran data formatting scripts in Python and created csv files to be consumed by Hadoop MapReduce jobs.
- Created Hive tables to store data into HDFS, loading data and writing hive queries that will run internally in map-reduce.
- Worked with the advanced analytics team to design fraud detection algorithms and then developed MapReduce programs to efficiently run the algorithm on the huge datasets.
- Worked on analyzing Hadoop cluster using different big data processing tools including Hive
- Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
Technologies Used- Hadoop, MapReduce, HDFS, Hive, Java, Sqoop, AWS, HBase, SQL
Confidential, North Carolina
Sr. Hadoop Developer
- Importing and sending out information into HDFS and Hive utilizing Sqoop.
- Experienced in running Hadoop stream jobs to process terabytes of xml group information with the help of Map Reduce programs.
- Used parquet file format for published tables and created views on the tables.
- In-charge of managing data coming from different sources.
- Support in running MapReduce Programs in the cluster.
- Cluster coordination services through Zoo Keeper.
- Involved in loading information from UNIX document framework to Hadoop Distributed File System.
- Installed, configured Hive and furthermore composed Hive UDFs.
- Automated every one of the jobs for pulling information from FTP server to stack information into Hive tables, utilizing Oozie work processes.
- Writing data to parquet tables both non-partitioned and partitioned tables by adding dynamic data to partitioned tables using Spark.
- Wrote User Defined functions (UDFs) for special functionality for Spark.
- Used SQOOP Export functionalities and scheduled the jobs on daily basis with Shell scripting in Oozie.
- Worked with SQOOP jobs to import the data from RDBMS and used various optimization techniques to optimize Hive and SQOOP.
- Used SQOOP import functionality for loading Historical data present in a Relational Database system into Hadoop File System (HDFS).
Technologies used- Hadoop, MapReduce, HDFS, Hive, Java, R, Sqoop, Spark, Sqoop
Confidential, Woodlands, TX
Big Data Engineer
- Collected raw files from FTP server and ingested files using proprietary ETL framework.
- Built new ETL packages using Microsoft SSIS. New packages included detailed workflow of data imports from client FTP server.
- Troubleshoot ETL failures and performed manual loads using SQL stored procedures.
- Engineered client's platform by incorporating new dimensions onto the client's site using SQL Server Integration Services.
- Engineered new OLAP cubes that aggregated health provider's patient visit data.
Technologies Used- SQL, ETL, SSIS
- Designed, implemented and maintained java application phases
- Took part in software and architectural development activities
- Conducted software analysis, programming, testing and debugging.
- Implemented various phases like develop, test, implement and maintain application software.
- Recommend changes to improve established java application processes
- Develop technical designs for application development
- Develop application code for java programs
- Developed servlet-based applications.
- Maintained the existing modules and applications.
- Developed server side and client-side code for internal and external web applications.
Technologies Used- Java based web services, Relational Databases, SQL and ORM, J2EE framework, Object Oriented Analysis and Design, JSP, EJB (Enterprise Java Beans), XML, Test-Driven Development, JSP, HTML, CSS, Ubuntu