- Highly Confident and Skilled Professional with having 7 years of professional experience in IT industry, with around 4+ years of hands - on expertise in Big Data processing using Hadoop, Hadoop Ecosystem (Map Reduce, Pig, Spark, Scala, Hive, Sqoop, Flume and HBase, Cassandra, Mongo DB, Akka Framework) implementation, maintenance, ETL and Big Data analysis operations
- Extensive experience in performing data processing and analysis using HiveQL, Pig Latin, custom Map Reduce programs in Java, python scripts in Spark and Spark SQL
- Experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS and vice-versa using Sqoop
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map/Reduce and Pig jobs
- Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis
- Involved in designing the data model in Hive for migrating the ETL process into Hadoop and wrote Pig Scripts to load data into Hadoop environment.
- Good knowledge in Amazon Web Services like Amazon EC2, IAM, EMR, S3 Storage, RedShift, DynamoDB, Aurora, and AWS Security Compliance Programs.
- Hands on experience in designing ETL operations including data extraction, data cleansing, data transformations, data loading
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS), Teradata and vice versa.
- Experience inHadoopadministration activities such as installation and configuration of clusters using Apache and Cloudera.
- Strong experienced in working with UNIX/LINUX environments, writing shell scripts.
- Experience in developing workflows using Flume Agents with multiple sources like Web Server logs, REST API and multiple sinks like HDFS sink and Kafka sink.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- Worked in Agile environment with active SCRUM participation.
- Excellent understanding of Software Development Life Cycle (SDLC) and strong knowledge on various project implementation methodologies including Waterfall and Agile.
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database and Java Core concepts like OOPS, Multithreading, Collections and IO.
- Excellent problem solving, and analytical skills.
Hadoop Core Services: HDFS, Map Reduce, Spark, YARN
Hadoop Distributions: Cloudera CDH’s, Hortonworks HDP’s, MapR, Apache
Hadoop Data Services: Hive, Pig, Sqoop, Flume, Impala, Kafka
Hadoop Operational Services: Zookeeper, Oozie
Monitoring Tools: Cloudera Manager, Ambari
Programming Languages: Java(J2EE), Python, Scala, SQL, Shell Scripting, C, PL/SQL
Operating Systems: Windows XP, Windows Server 2008, Linux, Unix
IDE Tools: Eclipse, IntelliJ, NetBeans
Application Servers: Red hat
Databases: MySQL, HBase, Cassandra, Mongo DB, Oracle
Cloud Services: Amazon Web Services, Microsoft Azure
Others: Git, Putty
Confidential, San Antonio, TX
Sr Hadoop Developer
- Involved in end-to-end data processing like ingestion, processing, and quality checks and splitting.
- Developed a data pipeline using Kafka, Sparkand Hive to ingest, transform and analyzing data.
- Real time streaming the data using Spark Streaming with Kafka.
- Developed multiple MapReduce jobs in Java and python for data cleaning and preprocessing.
- Created HBase tables to store variable data formats of PII data coming from different portfolios.
- Created Single Page Application with loading multiple views using route services and adding more user experience to make it more dynamic by using Angular JS framework.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Responsible for setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
- Cluster configuration and data transfer (distcp and hftp), inter and intra cluster data transfer.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Very good understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts.
- Involved in scheduling Oozie workflow engine to run multiple Hive jobs
- Experience on loading and transforming of large sets of structured, semi structured and unstructured data
- Responsible for importing and exporting data into HDFS and Hive using Sqoop.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Performed different types of transformations and actions on the RDD to meet the business requirements.
- Build REST web service by building Node.js Server in the back-end to handle requests sent from the front-end jQuery Ajax calls
- Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
- Involved in managing and reviewing Hadoop log files.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Responsible for writing Hive queries for data analysis to meet the business requirements.
- Responsible for creating Hive tables and working on them using Hive QL.
- Responsible to manage data coming from various sources.
- Installed and configured Hive and also written Hive UDFs.
- Cluster coordination services through Zookeeper.
- Installed and configured Hadoop MapReduce, HDFS.
- Installed and configured Pig.
- Developed Spark scripts by using Scala as per the requirement.
- Designed and implemented MapReduce based large-scale parallel relation-learning system.
- Also worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase and Sqoop.
Environment: Hadoop, MapReduce, Hive, Pig, Sqoop, Java, Node.js, Ozie, HBase, Kafka, Spark, Scala, Eclipse, Linux, Oracle, Teradata.
Confidential, Jersey City, NJ
Senior Hadoop Developer
- Collaborated closely with business analysts to transform business requirements into technical requirements, as well as to prepare low and high-level documentation.
- Performing transformations using Hive, MapReduce, hands on experience in copying .log, snappy files into HDFS from Greenplum using Flume & Kafka, loaded data into HDFS and extracted the data into HDFS from MYSQL using Sqoop.
- Designing and developing MapReduce jobs to process data coming in different file formats like XML, CSV, JSON
- Involved in Apache SPARK testing
- Implemented MapReduce programs to handle semi/ unstructured data like XML, JSON, Avro data files and sequence files for log files
- DevelopedSparkapplications using Scalafor easy Hadoop transitions.And Hands on experienced in writingSparkjobs and Sparkstreaming API using Scalaand Python
- Imported required tables from RDBMS to HDFS using Sqoop and used Storm/ Spark streaming and Kafka to get real time streaming of data into HBase
- Experience in working with Hadoop components such as HBase, Spark, Yarn, Kafka, Zookeeper, PIG, HIVE, Sqoop, Oozie, Impala and Flume.
- Wrote HIVE UDF's as per requirements and to handle different schemas and xml data
- Implemented ETL code to load data from multiple sources into HDFS using Pig Scripts
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka
- Used SparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive developedSpark code andSpark-SQL/Streaming for faster testing and processing of data
- Assisted in problem solving with Big Data technologies for integration of Hive with HBase and Sqoop with HBase
- Designed and developed User Defined Function (UDF) for Hive and Developed the Pig UDF'S to pre-process the data for analysis as well as experience in (UDAFs) for custom data specific processing
- Involved in preparing the S2TM document as per the business requirement and worked with Source system SME’s in understanding the source data behavior
- Worked on NoSQL databases including HBase and Cassandra
- Installed Oozie workflow engine to run multiple Hive and Pig jobs
- Designed and developed the core data pipeline code, involving work in Java and Python and built onKafkaand Storm
- Good knowledge on Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
- Hands on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka
- Performance tuning using Partitioning, bucketing of IMPALA tables
- Created POC (Proof of Concept) to store Server Log data into Cassandra to identify System Alert Metrics.
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper
Environment: Map Reduce, HDFS, Hive, Pig, HBase, Python, SQL, Sqoop, Flume, Oozie, Impala, Scala, Spark, Apache Kafka, Zookeeper, J2EE, Linux Red Hat, HP-ALM, Eclipse, Cassandra, Talend, Informatica.
Confidential, Eagan, MN
- Involved in complete Bigdata flow of the application starting from data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data.
- Worked on different Big Data file formats like txt, sequence, Avro, parquet and snappy compression.
- Experienced in developing HIVE and SCALA Queries on different data formats like Text file, CSV file.
- Working extensively on HIVE, SQOOP, MAPREDUCE, SHELL, PIG and PYTHON.
- Using HIVE join queries to join multiple tables of a source system and load them into Elastic Search Tables.
- Analyzing HBase database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement.
- Creating the Hive tables and partitioned tables using Hive Index and bucket to make ease data analytics.
- Experience in HBase database manipulation with structured, unstructured and semi-structured types of data.
- Using SQOOP to move the structured data from MySQL to HDFS, HIVE, PIG and HBase.
- Using Oozie to schedule the workflows to perform shell action and hive actions.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Experience in writing the business logics for defining the DAT, CSV files for MapReduce.
- Experience in writing aggression logics in different combinations to perform complex data analytics to the business needs.
- Develop HiveQL scripts to perform the incremental loads.
- Using Flume to handle streaming data and loaded the data intoHadoopcluster.
- Created shell script to ingestion the files from Edge Node to HDFS.
- Importing and Exporting Big Data in CDH in to every data analytics ecosystem
- Experience in writing HIVE JOIN Queries.
- Involved in data migration from one cluster to another cluster.
- Using PIG predefined functions to convert the fixed width file to delimited file.
- Worked on creating Map Reduce scripts for processing the data.
Environment: Hadoop, HDFS, Hive, Pig, Sqoop, Spark, Scala, MapReduce, Cloudera, NoSQL, HBase, Shell Scripting, Linux.
- Involved in Creation of tables, indexes, sequences, constraints and created stored procedures and triggers which were used to implement business rules.
- Installation of SQL Server on Development and Production Servers, setting up databases, users, roles and permissions.
- Extensively involved in SQL joins, sub queries, tracing and performance tuning for better running of queries
- Provided documentation about database/data warehouse structures and updated functional specification and technical design documents.
- Designed and created different ETL packages using SSIS and transfer data from heterogeneous database different files format Oracle, SQL Server, and Flat File to SQL server destination.
- Worked on several transformations in Data Flow including Derived column, Slowly Changing Dimension Using SSIS Controls, Lookup, Fuzzy Lookup, Data Conversion, Conditional split and many more.
- Created various reports with drilldowns, drill through, calculated members, and drilldowns reports by using SQL Server Reporting Services
- Used various report items like tables, sub report and charts to develop the reports in SSRS and upload into Report Manager
- Created complex Stored Procedures, Triggers, Functions, Indexes, Tables, Views, SQL joins and other T- SQL code to implement business rules
- Used Performance Monitor and SQL Profiler to optimize queries and enhance the performance of database servers.
Environment: MS SQL Server 2012/2008R2/2008, T- SQL, SQL Server Reporting Services (SSRS), SSIS, SSAS, Business Intelligence Development Studio (BIDS), MS Excel, Visual Source Team Foundation Server, VB Script.
- Analyzing and preparing the requirement Analysis Document.
- Responsible for developing and modifying the existing service layer based on the business requirements.
- Deploying the Application to the JBOSS Application Server.
- Study OAuth/JWT/TOTP/SAML series protocol for SSO solution
- Used to J2EE and EJB to handle the business flow and Functionality.
- Involved in the complete SDLC of the Development with full system dependency.
- Actively coordinated with deployment manager for application production launch.
- Provide Support and update for the period under warranty.
- Monitoring of test cases to verify actual results against expected results.
- Carrying out Regression testing to track the problem tracking.
Environment: Java, J2EE, EJB, UNIX, XML, Work Flow, JMS, JIRA, Oracle, JBOSS, Soap, Java script, Web Logic Server