- 6+ years of experience in full Software Development Life Cycle (SDLC), AGILE Methodology and analysis, design, development, testing, implementation and maintenance in Hadoop, Data Warehousing, Linux and Java.
- 3+ years of experience in providing solutions for Big Data using Hadoop 2.x, HDFS, MR2, YARN, Kafka, PIG, Hive(HCatalog), Sqoop, HBase, Cloudera Manager,Hortonworks, Zookeeper, Oozie, Hue, CDH5 & HDP 2.x.
- Experienced in Big data, Hadoop, NoSQL and various components such as HDFS, Job Tracker, Task Tracker, Name node, Data node and Map Reduce 2, YARN programming paradigm.
- Experienced in building highly scalable Big - data solutions using Hadoop and multiple distributions i.e., Cloudera, Hortonworks and NoSQL platforms (Hbase and Cassandra).
- Good knowledge on Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
- Used JIRA tool for tracking stories progress and follow agile methodology.
- Involved in creating Spark cluster in HDInsight by create Azure compute resources with spark installed and configured.
- Deployed Hadoop cluster in Azure HDInsight to compare scalability and cost-effectiveness, Queried Hadoop cluster using PowerShell, Hue and as well as the remote console.
- Experience in importing and exporting terabytes of data using Sqoop from HDFS to Relational Database Systems and vice-versa
- Experienced in using Kafka as a distributed publisher-subscriber messaging system.
- Experience in importing and exporting data using Sqoop from HDFS/Hive/HBase to Relational Database Systems and vice-versa.
- Hands on experience in in-memory data processing with Apache Spark.
- Experience in integration of various data sources like Oracle, DB2, Sybase, SQL server and MS access and non-relational sources like flat files into staging area.
- Good experience in writing PIG scripts and Hive Queries for processing and analyzing large volumes of data.
- Create / Modify / Drop Teradata objects like Tables, Views, Join Indexes, Triggers, Macros, Procedures, Databases, Users, Profiles and Roles.
- Extending Hive and Pig core functionality by writing custom UDFs.
- Experience in Azure Data Factory(ADF)
Hadoop Ecosystem: Hadoop, MapReduce, Sqoop, Hive(HCatalog), Oozie, PIG, HDFS, Zookeeper, Flume, Spark, Kafka
NoSQL Databases: Hbase, Cassandra, MongoDB.
Java & J2EE Technologies: Core Java, Servlets,JSP, JDBC, JNDI, Java Beans.
Languages: C, C++, JAVA, SQL,PL/SQL, PIG Latin, HiveQL, Unix shell scripting.
Frameworks: MVC, Spring, Hibernate, Struts 1/2, EJB, JMS, JUnit, MR-Unit
Databases: Oracle 11g/10g/9i, Mysql,DB2, MS SQL Server.
Application Server: Apache Tomcat, JBoss, IBM Web sphere, Web Logic.
Web Services: WSDL, SOAP, Apache CXF, Apache Axis, REST.
Methodologies: Scrum, Agile, Waterfall.
Confidential, New Jersey
- Worked on loading disparate data sets coming from different sources to BDpaas (HADOOP) environment using SQOOP.
- Developed UNIX scripts in creating Batch load and driver code for bringing huge amount of data from Relational databases to BIGDATA platform.
- Ingested data from one tenant to the other
- Developed Pig queries to load data to HBase
- Leveraged Hive queries to create ORC tables
- Created ORC tables to improve the performance for the reporting purposes.
- Involved in the coding and integration of several business critical modules of CARE application using Java, spring, Hibernate and REST web services on Web Sphere application server.
- Involved in project to provide eligibility, structure and transactional feeds to River Valley Facets platform where heritage and neighborhood health plans and related commercial products are maintained and administered.
- Developed web pages using JSPs and JSTL to help end user make online submission of rebates. Also used XML Beans for data mapping of XML into Java Objects.
- Worked with Systems Analyst and business users to understand requirements for feed generation.
- Created Health Allies Eligibility and Health Allies Transactional feeds extracts using Hive, HBase, Python and UNIX to migrate feed generation from a mainframe application called CES (Consolidated Eligibility Systems) to big data.
- Used bucketing concepts in Hive to improve performance of HQL queries.
- Used numerous user defined functions in hive to attain complex business logic in feed generation.
- Experience in importing and exporting terabytes of data using Sqoop from Relational Database Systems to HDFS.
- Developed the custom writable JAVA programs to load the data into the Cassandra cluster by using the Cassandra APIS.
- Developed Spark scripts by using Scala shell commands.
- Created reusable Python script and added it to distributed cache in Hive to generate fixed width data files using an offset file.
- Moving the data from Oracle, Teradata and MS SQL Server in to HDFS using Sqoop and importing various formats of flat files in to HDFS
- Implemented POC on Cassandra datacenter replication for disaster recovery plan.
- Created a MapReduce program which looks into data in HBase current and prior versions to identify transactional updates. These updates are loaded into Hive external tables which are in turn referred by Hive scripts in transactional feeds generation.
- Worked on agile methodology using Rally
Environment: MAPR, Sqoop, Hive, Pig, Python, UNIX, HBase, Spark, Cassandra, Rally.
- Working in agile, successfully completed stories related to ingestion, transformation and publication of data on time.
- Worked on mapreduce code for Omniture (hit data capturing tool) and some more business scenarios.
- Written Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Used the various API of java including JAXRS, JDBC, and AJAX.
- Data ingestion from teradata to hadoop (Sqoop imports). Perform validations and consolidations for the imported data.
- Written Java program to retrieve data from HDFS and providing REST services.
- Ingested data sets from different DBs and Servers using Sqoop Import tool and MFT (Managed file transfer) Inbound process.
- Good experience in Requirement Gathering, Analysis, Design, UI Prototype and Development.
- Design/Implement large scale pub-sub message queues using Apache Kafka.
- Wrote Pig Latin Scripts and Hive Queries using Avro schemas to transform the Datasets in HDFS.
- Wrote custom Record Reader for mapreduce programs.
- As part of support, responsible for troubleshooting of MapReduce Jobs, Pig Jobs, Hive
- Worked on performance tuning of Hive & Pig Jobs.
- Used maven to build the Jars for MapReduce, Pig and Hive UDFs.
- Design & implement ETL process using Talend to load data from Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa. Loading data into HDFS.
Environment: MapReduce, Hive, Pig, Sqoop, Tableau, MFT, Oozie, Flume, Linux.
Confidential, Redmond, WA
Sr. Hadoop and Spark Developer
- Implemented Spark streaming to receive real time data from the Push agents and dump them to event hub
- Handled importing data from the sources for data cleansing and transformations
- Created Spark batch jobs to move data from event hubs to anomaly detection engine
- Development of Spark streaming, SQL and batch jobs through Scala
- Created Spark SQL queries for faster requests
- Implemented Hive Queries for analyzing data using Hive QL
- Converted the data into Parquet format and read the data from Parquet format
- Orchestrating Oozie Workflow Engine in running workflow jobs
- Worked on creating RDDs, Data Frames and performed different Actions and Transformations.
- Developed the website using HTML5, SQL Server Database, Java script, C# and CSS language in Visual Studio 2015 having MVC architecture.
- Defined the data flow within Hadoop eco system and guide the team with the implementation
- Developed custom UDFS and implemented Pig scripts.
- Involved in Azure Data Factory (ADF) Pipeline creation for job execution.
- Developed Power Shell scripts to automate USQL scripts and deploy them ADL Storage account.
- Provided support to data analysts in running Hive queries to for further anomaly analysis
- Specifying the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Review the Design, Codes, Test Plans and Test Results Coded numerous Web API REST services using C#, Excellent experience with agile scrum and Test driven methodology.
- Involved in developing required classes and interfaces using C# .NET.
Environment: Apache Hadoop, Spark, Scala, IntelliJ, SBT, c#.Net, HDInsight, Azure Data Factory, Event Hub, Data Lake, Oozie.
Confidential, Charlotte, NC
Hadoop Developer/Big data cloud Engineer
- Worked on analyzing Hadoop cluster using different big data analytic tools including Kafka, Pig,Hive and MapReduce.
- Importing the data from the MySql and Oracle into the HDFS using Sqoop.
- Importing the unstructured data into the HDFS using Flume.
- Worked hands on with ETL process and Involved in the development of the Hive/Impala scripts for extraction, transformation and loading of data into other data warehouses.
- Involved in using HBase Java API on Java application.
- Involved in running Ad-Hoc query through PIG Latin language, Hive or Java MapReduce.
- Real time streaming the data using Spark with Kafka.
- Wrote Apache Spark streaming API on Big Data distribution in the active cluster environment.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive
- Coordinate with other functional teams to resolve data load issues into Teradata or vertica
- Perform Daily Operational reporting using the Teradata Decision Expert tool
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
- Loading data from multiple sources on AWS S3 cloud storage
- Responsible for continuous monitoring and managing Elastic MapReduce cluster through AWS console
- Used NoSQL database with Cassandra.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Importing and exporting data into HDFS using Sqoop and Kafka.
Environment: Python, Hortonworks, MapReduce, Hive, HBase, Flume, Impala, Sqoop, Pig, Zookeeper, Cassandra, Java, ETL, SQL Server, CentOS, UNIX, Linux, Windows 7/ Vista/ XP.
- Designed Use Case Diagrams, Class Diagrams and Sequence Diagrams and Object.
- Diagrams, using IBM Rational Rose model the detail design of the application.
- Involved in designing user screens using HTML as per user requirements.
- Used Spring-Hibernate integration in the back end to fetch data from Oracle and MYSQL databases.
- Used Spring Dependency Injection properties to provide loose-coupling between layers.
- Implemented the Web Service client for the login authentication, credit reports and applicant information.
- Used Web services (SOAP) for transmission of large blocks of XML data over HTTP.
- Used Hibernate object relational data mapping framework to persist and retrieve the data from database.
- Wrote SQL queries, stored procedures, and triggers to perform back-end database operations by using SQL Server 2005.
- Implemented the logging mechanism using Log4j framework.
- Wrote test cases in JUnit for unit testing of classes.
- Developed application to be implemented on Windows XP.
- Created application using Eclipse IDE.
- Installed Weblogic Server for handling HTTP Request/Response.
- Used Subversion for version control and created automated build scripts.