- Overall 8+ years of experience in Software Development, which include experience working 4+ years in Hadoop development
- Implemented in Analysis, Design, Implementation, Testing and Deployment of Web - Based Distributed and Enterprise Applications
- Good Experience in Hadoop architecture and various components such as HDFS, Job Tracker Task Tracker, Name Node, Data Node and Map Reduce & Spark programming paradigm
- Experience in importing, exporting data from and into HDFS using Sqoop, Nifi.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs
- Hands on experience in using Spark tools like RDD transformations and spark QL
- Experience in Extraction, Transformation & Loading of data with different file formats like CSV text files, sequence files, Avro, Parquet, JSON, ORC and used file compression codecs like gzip lz4 & snappy.
- Good experience in creating data ingestion pipelines, data transformations, data management data governance and real time streaming at an enterprise level.
- Profound experience in creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
- Highly Skilled in integrating Kafka with Spark Streaming high speed data processing.
- Good at writing custom RDD's in Scala for applying data specific transformations and also implemented design patterns to improve the performance.
- Experienced in using apache Hue and Ambari to manage and monitor the Hadoop clusters.
- Worked on Oozie to manage and schedule the jobs on Hadoop cluster
- Expertise in life cycle implementation using CDH (Cloudera) and HDP (Hortonworks Data Platform) distributions.
- Good Experience in using apache NiFi to automate the data movement between different Hadoop systems.
- Experience in applying the latest development approaches including applications in Spark using
- Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Experience in analysing data using HiveQL and custom Map Reduce programs in Java
- Experience in handling messaging services using Apache Kafka
- Implemented Kerberos for storing authentication to provide data Security.
- Worked in publishing and building customer interactive reports and dashboards, reports scheduling using Tableau server.
- Developed Tableau Visualizations and dashboards using Tableau desktop.
- Experience in working with different databases like NoSQL and MYSQL along with exposure to
- Hibernate, JDBC for mapping an object-oriented domain model to a traditional relational database.
- Developed POC on Machine learning Algorithm Scripts.
- Good knowledge in ETL, Data Integration and Migration, extensively used ETL methodology for supporting Data Extraction, transformations and loading.
- Having Good Knowledge on cloud platforms like GCP, AWS.
- Excellent interpersonal skills, good experience in interacting with clients with good team player and problem-solving skills
- Have a good experience working in agile development environment including Scrum methodology.
- Possess Strong communication, logical, Analytical and Interpersonal Skills. I am an active team player.
Hadoop/Big Data: Apache Cassandra, Mongo DB, HBase
HDFS: Map Reduce, Spark, Yarn, Kafka, PIG
Web/Application servers: HIVE, Sqoop, Storm, Flume, Oozie, Impala, WebLogic, WebSphere, Apache Tomcat
Cloud Technologies: HBase, Hue, Zookeeper. Nifi
Programming Languages: AWS, Microsoft Azure, GCP, PL/SQL, Pig Latin, Python, HiveQL, Scala, SQL
Business Intelligence or Reporting tools: Java, PHP, C++, C Tableau, Splunk
Development Tools: Eclipse, Intelli, NetBeans., No SQL Databases
Big Data Developer
Confidential, Herndon, VA
- Collected Spark Streaming data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in HBase.
- Experience in writing Kafka consumer group to get the data that was generated by sensors from the Patient's body activities.
- The data will be collected in to the HDFS system online aggregators by Kafka.
- Kafka consumer will get the data from different learning systems of the patients.
- Used Hadoop Hive and Spark SQL for analysing the Health insurance data to help by extracting data sets for meaningful information such as medicines, diseases, symptoms, opinions geographic region details.
- Developed workflow in Oozie to orchestrate a series of Pig scripts to cleanse data, such as removing rmation or merging many small files into a handful of very large compressed files using pig pipelines in the data preparation stage.
- Written Hive UDF in Java and used in sampling of large data sets.
- Worked in transforming data from legacy tables to HDFS, and HBASE tables using SQOOP.
- Implemented test scripts to support test driven development and continuous integration.
- Worked in NIFI Development, architecture and design discussions with the technical team and interface with other teams to create efficient and consistent Solutions.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Developed Spark programs to parse the raw data, populate staging tables and store the refined data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review process and resolved technical problems.
- Developed complete end to end Big-Data Processing in Hadoop Ecosystems.
Environment: Hadoop, Map Reduce, Spark, Kafka, HDFS, Hive, Pig, Oozie, Core Java, Eclipse Hbase, Cloudera, Oracle, NIFI, UNIX Shell Scripting.
- Worked on Spark SQL to handle structured data in Hive.
- Worked in making Hive tables, stacking information, composing hive inquiries, producing segments and basins for enhancement.
- Worked on migrating tables from RDBMS into Hive tables using SQOOP and later generate visualizations using Tableau.
- Worked on complex Map Reduce program to analyses data that exists on the cluster.
- Analyzed substantial data sets by running Hive queries and Pig scripts.
- Written Hive UDF to sort Structure fields and return complex data type.
- Worked in AWS environment for development and deployment of custom Hadoop applications.
- Creating files and tuned the SQL queries in Hive utilizing HUE (Hadoop User Experience).
- Worked on collecting and aggregating large amounts of log data using Storm and staging data in HDFS for further analysis.
- Worked on Tableau to build customized Interactive reports, Worksheets and dashboards.
- Managed real-time data processing and real time Data Ingestion in Mongo DB and Hive using Storm.
- Stored the processed results In Data Warehouse, and maintaining data using Hive.
- Experienced in working with Spark eco system using Spark SQL and Scala queries on different formats like Text file, CSV file.
- Created Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3 EMR.
- Developed a Proof of Concept which uses Apache Nifi for ingestion of data from the Kafka, to perform the conversion of Raw XML data into JSON, AVRO and implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
Environment: Cloudera, HDFS, Map Reduce, Storm, Hive, Pig, SQOOP, Apache Spark, Python Accumulo, Oozie Scheduler, Kerberos, AWS, Tableau, Java, UNIX Shell scripts, HUE, NIFI,Git Maven.
Hadoop Developer/ Data Analyst
Confidential, Seattle, WA
- Provided a solution using Hive, Sqoop (to export/ import data), for faster data load by replacing the traditional ETL process with HDFS for loading data to target tables.
- Maintaining and Monitoring Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Designed Pig Latin scripts to sort, group, join and filter the data as part of data transformation as per the business requirements.
- Data files were merged and loaded into HDFS using java code and tracking history related to merge files were maintained in HBase.
- Collaborate with the Data Warehouse team to design and develop required ETL processes performance tune ETL programs/scripts.
- Creating Hive tables and working on them using HiveQL.
- Written Apache PIG scripts to process the HDFS data.
- Created Java UDFs in PIG and HIVE.
- Created big data workflows to ingest the data from various sources to Hadoop using OOZIE and these workflows comprises of heterogeneous jobs like Hive, SQOOP and Python Script.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.
- Integrated in planning process of iterations under the Agile Scrum methodology.
- Experienced in writing HIVE JOIN Queries.
- Involved in testing the Business Logic layer and Data Access layer using JUnit.
- Used SQL DB for writing SQL scripts, PL/SQL code for procedures and functions.
- Prepared technical reports and documentation manuals for efficient program development.
Environment: Java, HDP-2.2 YARN cluster, Tableau, HDFS, Map Reduce, Apache Hive, Apache Pig, HBase, Sqoop, XML, SQL, UNIX, ETL.
Jr. Java Developer
- Worked closely with business analysis team to understand the existing Banking Application developed.
- Designed Banking Application (prepared Uses Cases, Sequence Diagrams, Class Diagrams etc.).
- Designed a database to be used by the Banking Applications (prepared ER Diagrams).
- Configured CVS and Tomcat.
- Used JDBC to interact with the database.
- Created an SQL Server database, which includes tables, views, triggers, constraints, stored procedures, functions, etc.
- Developed Test Cases and performed Unit Testing.
- Designed the Online Banking Application along with a database.
- Developed Servlets and JSPs for managing user registration and authentication as well as limited set of transactions (including transferring money between checking and savings accounts generating statements, etc.) and other services.
- Used JDBC in various servlets to interact with the database.
- Managed build process using Ant.
- Developed student portal application using PHP, MySQL.
- Designed My Account application using PHP object oriented.
- Developed and maintained the Registration application in PHP.
- Migrated content from PHP 4 legacy to PHP5 with Ajax enabled.
- Major contribution in designing database, creating database schema, creation using RDBMS data migration.
- Involved in migration and re-engineering of MySQL database from 4.1 to 5.0.
- Tuned My SQL Queries for better performance.
- Migration and re-engineering is based on SDLS pattern.
- Involved all the steps like requirement analysis, TS Design, Test Plan, Development and Testing and then deployment.
- Acted as a business functional analyst.
- Provided UAT (User Acceptance Testing) and end user
Environment: MySQL PHPMyAdmin, PHP 4.x, 5.x, Zend Studio, Web Services, XML/XSLT, jQuery, Ajax, Shell Script Apache, UNIX.