- Around 7 years of professional experience in fields of software Analysis, Design, Development, Deployment and Maintenance of software and Big Data applications.
- Having 4 years hands on experience in Big Data implementation with strong experience on major components of Hadoop Ecosystem ingestion tools like Sqoop, NiFi, StreamSets, Kafka and Analytic tools like HIVE, PIG, Impala, Spark.
- Have solid Background working on DBMS technologies such as MYSQL, NoSQL, data warehousing architectures and performed migration from different databases SQL server, Oracle, MYSQL to Hadoop.
- Experienced in developing MapReduce programs using ApacheHadoop for working with Big data, Hadoop architecture using Map Reduce programming paradigm.
- Used Pig to extract, write complex data transformations, cleaning and processing of large data sets and storing data in HDFS.
- Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
- Proficient in configuring Zookeeper, Cassandra&Flume to the existing Hadoopcluster.
- Experience in converting Hive or SQL queries into Spark transformations using Python and Scala.
- Experience with the Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark - SQL, PySpark, Pair RDD's and Spark YARN.
- Experience on developing JAVA MapReduce jobs for data cleaning and data manipulation as required for the business.
- Experienced in working with AmazonWebServices using EC2 for computing and S3 as Storage Mechanism.
- Have Experience with different File Formats like Text File, Avro File and Parquet for Hive querying and Processing.
- Experienced managing Linux platform servers and in installation, configuration, supporting and managing Hadoop Clusters.
- Worked on different IDE tools like PyCharm, NetBeans and Eclipse.
- Ability to quickly adapt new environment and technologies.
- Expertise in support activities including installation, configuration and successful deployment of changes across all environments.
- Excellent Communication skills, Interpersonal skills, problem solving skills and a team player.
Operating Systems: Windows, Linux distributions like Ubuntu, CentOS.
Data stores: NoSQL, MySQL, HBase, Cassandra, MongoDB
Big data: Map Reduce, HDFS, Flume, Hive, Pig, Oozie, YARN, Hadoop, Kafka, Sqoop, Impala, Zookeeper, Spark, Mahout, MongoDB, Avro, Storm and Parquet.
Visualization: Tableau, Matplotlib, Shiny, ggplot2
Programming Languages: Java1.7/1.8, Scala, Python, C, SQL,HTML, Pig Latin, Hive SQL, XML, UNIX Shell Scripting
Amazon Stacks: AWS EMR, S3, Aurora, DynamoDB, SageMaker, EMR and EC2
Application Servers: Web logic 11g, 12c, Tomcat 5.x and 6.x
Java Technologies: Servlets, JSP, JDBC
Confidential, Hartford, CT
- Created and maintained Technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
- Have done monitoring and reviewing Hadoop log files and written queries to analyze them.
- Conducted POC's and mocks with client to understand the Business requirement, also attended defect triage meeting with UAT team and QA team to ensure defects are resolved in timely manner.
- Worked with Kafka for the proof of concept for carrying out log processing on a distributed system.
- Understanding the existing Enterprise data warehouse set up and provided design and architecture suggestion converting to Hadoop using MRv2, HIVE, SQOOP and PigLatin.
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop, Flume and load into Hive tables, which are partitioned.
- Developed HiveSQLqueries, Mappings, tables, external tables in Hive for analysis across different banners and worked on partitioning, optimization, compilation and execution.
- Written complex queries to get the data into HBase and responsible for executing hive queries using Hive Command Line, HUE.
- Designed and implemented proprietary data solutions by correlating data from SQL and NoSQL databases using Kafka.
- Used Pig as ETL tool to do transformations and some pre-aggregations before storing the analyzed data into HDFS.
- Developed a PySparkcode for saving data into AVRO and Parquet format and building hive tables on top of them.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Automated workflows using shell scripts to pull data from various data bases into Hadoop.
- Developed bashscripts to bring the TLog files from ftp server and then processing it to load into hive tables. All the bash scripts are scheduled using Resource Manager Scheduler.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
- Developed spark programs using Scala, involved in creating SparkSQL Queries and developed Oozie workflow for spark jobs.
Environment: : HDFS, Hadoop 2.x YARN, Teradata, NoSQL, PySpark, MapReduce, pig, Hive, Sqoop, Spark, Scala, Oozie, Java, Python, MongoDB, Shell and bash Scripting.
Confidential, Atlanta, GA
- Migrated the Well’s data from Oracle database into HDFS using Sqoop and Created Hive tables.
- Involved in data migration using Sqoop from SAP HANA into HDFS .
- Worked on Various tools like Apache NiFi , Kafka , StreamSets and Navigator for POC’s.
- Created workflow using Apache NiFi for Pulling JSON data from Rest API into Hadoop on scheduled basis.
- Worked on converting various file formats like JSON data into CSV , AVRO into ORC format etc.
- Experience in Automating Sqoop , Hive scripts using Oozie work flow scheduler
- Create tables in HBASE using Hive tables in Json format using SERDEPROPERTIES.
- Designed and implemented multi-column Partitioning for various tables in HIVE.
- Developed a workflow in Apache NiFi using WebSocket to pull the incidents data from GIS into Hadoop HDFS .
- Created Hbase tables for incremental data to perform upsert operations.
- Imported data from different sources into HDFS using Spark, analyzed data using SqlContext in scala and created tables in Hive using HiveContext .
- Consolidated the small files for large set of data using Spark Scala to create table on the data.
- Worked on tuning the performance of Apache NiFi workflow to optimize the ingestion speeds.
- Used Spark SQL to process huge amount of structured data and implemented Spark RDD transformations and actions.
- Experienced in making data cosumable from different other tools like FME safe soft, Alteryx, Spot fire etc.,
- Experience in working with Rest API using NodeJS-Impala.
- Worked on pushing the archived data from HDFS into GIS (Geographic information system).
- Developed producer and consumer programs for Kafka in order to produce and consume the data.
- Worked with StreamSets Data Collector to get the data loaded into HDFS from various sources like from local, RDBMS Databases, API’s etc.,
- Participated in developing NodeJS application to pull data in Json format from Impala tables.
- Worked with WebHDFS Rest API which provides web services access to data stored in HDFS.
- Developed various scripts using Shell Script .
Environment : Cloudera, HDFS, Sqoop, Hive, Kafka, NiFi, StreamSets, Shell Scripting, Spark, Scala, WebHDFS, HBase, Oracle, SAP Hana.
Confidential, Princeton, NJ
- Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Scoop.
- Hands on experience in loading data from UNIX file system and Teradata to HDFS.
- Analysed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most purchased product on website.
- Install, configure, and operate Apache stack i.e. Hive, HBase, Pig, Sqoop, Zookeeper, Oozie, Flume and Mahout on Hadoop cluster.
- Created a high-level design for the Data Ingestion and data extraction Module, enhancement of Hadoop Map-Reduce job which joins the incoming slices of data and pick only the fields needed for further processing.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Have been working with AWS cloud services (VPC, EC2, S3, EMR, DynamoDB,SNS, SQS).
- Have been a part of team that has taken care of setting the infrastructure in AWS.
- Used AmazonS3 as a storage mechanism and written python scripts that dump the data into S3.
- Tested raw data and executed performance scripts.
- Worked with NoSQL database HBase to create tables and store data.
- Developed and involved in the industry specific UDF (user defined functions)
- Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map reduce, Hive, Pig, and Sqoop.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
Environment: Cloudera, MapReduce, Hive, Solr, Pig, HDFS, HBase, Flume, Sqoop, Zookeeper, Python, Flat files, AWS, Unix/Linux.
- Actively participated in requirements gathering, analysis, design, and testing phases.
- Designed use case diagrams, class diagrams, and sequence diagrams as a part of Design Phase.
- Developed the Enterprise Java Beans (Stateless Session beans) to handle different transactions such as online funds transfer, bill payments to the service providers.
- Implemented Service Oriented Architecture (SOA) using JMS for sending and receiving messages while creating web services.
- Developed XML documents and generated XSL files for Payment Transaction and Reserve Transaction systems.
- Developed SQL queries and stored procedures.
- Used JUnitFramework for the unit testing of all the java classes.
- Implemented various J2EE Design patterns like Singleton, Service Locator and SOA.
- Designed use cases, activities, states, objects and components.
- Performing validations between various users.
- Design of Java Servlets and Objects using J2EE standards.
- Coded HTML, JSP and Servlets.
- Coding xml validation and file segmentation classes for splitting large XML file into smaller segments using SAX Parser.
- Created new connections through application coding for better access to DB2database and involved in writing SQL & PL SQL - Stored procedures, functions, sequences, triggers, cursors, object types etc.
- Involved in testing and deploying in the development server.
- Wrote stored procedures (PL/SQL) and calling it using JDBC.
Environment: Java1.7 J2EE, Apache Tomcat, CVS, JSP, Servlets, Struts, PL/SQL.