- 9+ years of IT Experience in systems development, databases & analytics, with 4 years of Big Data, Hadoop Development and Development and Design of Java based enterprise applications .
- Expertise in Big Data Hadoop, Spark, Scala, Java and various frameworks in Hadoop such as HDFS, MR, Yarn, Hive, Pig, Impala, Sqoop, Flume, Zoo keeper, Oozie, Hue, HBase, NIFI.
- Hands on experience in Capturing data from existing relational databases (Oracle, MySQL, Teradata) that provide SQL interfaces using Sqoop.
- Experience in working with Cloudera (CDH4 &CDH5) and Horton Works Hadoop Distributions.
- Hands on experience in Sequence files, RC files, Avro, Parquet and JSON Combiners, Counters, Dynamic Partitions, Bucketing for best practice and performance improvement.
- Experience in Design/ Development and Implementation of Big Data Application.
- Skilled in developing Java Map Reduce programs using java API and using hive, pig to perform data analysis, data cleaning and data transformation.
- Extensive experience in developing Pig Latin Scripts and using Hive Query Language for data analytics.
- Transformed date related data into application compatible format by developing apache Pig UDFs.
- Expert in creating Pig and Hive UDFs using Java to analyze the data efficiently.
- Experience in JAVA, J2EE, Web Services, SOAP, HTML and XML related technologies demonstrating strong analytical and problem - solving skills, computer proficiency and ability to follow through with projects from inception to completion.
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database and Java Core concepts like OOPS, Multithreading, Collections and IO.
- Strong experience and knowledge of real time data analytics using Spark Streaming, Kafka and Flume.
- Exposure to Spark Streaming, Spark MLlib, Scala and creating the Data Frames handled in Spark with Scala.
- Hands on experience in working on Spark SQL queries, Data frames, and import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
- Good working knowledge on Eclipse, Intellij IDE for developing and debugging Java applications
- Experience in data workflow scheduler Zoo-Keeper and Oozie to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with the control flows.
- Successfully migrated Legacy application to Big Data application using Hive, Pig, HBase in Production level.
- Expert in working with Hive data warehouse tool creating tables, Data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
- Experience in importing and exporting data using Sqoop to HDFS from Relational Database Systems.
- Experience in integrating Hive queries into Spark environment using Spark SQL.
- Hands on Experience in installing, configuring and maintaining the Hadoop clusters.
- Experience in designing both time driven and data driven automated workflows using Oozie.
- Good understanding of NoSQL Data bases like HBase.
- Experience in AWS - S3, EC2, Redshift.
- Used HiveQL to do analysis on the data and identify different correlations
- Experience in developing custom UDFs for Pig and Hive to in corporate methods and functionality of Python into Pig Latin and HiveQL.
- Written MapReduce programs in Python with the Hadoop streaming API.
- Good Knowledge of Data Profiling using Informatica Data Explorer.
- Extensive experience in building ETL Design and Development.
- Good understanding of Project Management Knowledge Areas and Process groups.
- Well versed in OLTP Data Modeling and Strong knowledge of Entity-Relationship concepts.
- Experience in Data Cleaning and Data Preprocessing using Python Scripting.
- Good experience in all the phases of Software Development Life Cycle (Analysis of requirements, Design, Development, Verification and Validation, Deployment).
- Have strong Database knowledge on SQL,PL/SQL programming and RDBMS concepts.
- Extensively involved in creating Oracle SQL queries, PL/SQL Stored Procedures, Functions, Packages, Triggers and Cursors with Query optimizations as part of ETL Development process.
- Experience as a java Developer in client/server technologies using J2EE Servlets, JSP, JDBC and SQL.
- Knowledge on Handling Hive queries using Spark SQL that integrate Spark environment.
- Hands on experience of UNIX and shell scripting to automate scripts.
- Ability to work effectively and efficiently in a team and individually with excellent interpersonal, technical and communication skills.
Big Data Technologies: Hadoop, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, ZooKeeper, Impala, Hue, Flume, HBase, Spark, Scala, Kafka, NIFI.
Databases: Oracle SQL,PL/SQL, Teradata, MySQL 5.0, MS SQL Server, Hbase, Cassandra
ETL/BI Tools: MSBI, Talend, Informatica Power Center 9.x/8.6, OBIEE
IDE: Eclipse, Rational Web Application Developer, NetBeans, TextPad
App/Web Servers: Apache Tomcat Server, Apache / IBM HTTP Server, WebSphere Application Server 6.1/7.0
Messaging & Web Services: SOAP, REST, WSDL, UDDI, JMS and XML
Methodologies: Agile, Scrum, Waterfall model, Spiral model, SDLC
Confidential, Irving, TX
- Worked on analyzing Hadoop cluster and different big data analytic tools including MapReduce, Hive and Spark.
- Involved in review of functional and non-functional requirements.
- Responsible for designing and implementing the data pipeline using Big Data tools including Hive, Spark, Scala and Stream Sets.
- Experience in using Spark, Apache NiFi, Kafka and Flume in creating data streaming solutions.
- Migrated complex MapReduce programs into Spark RDD transformations, actions.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Developed and implemented Apache NIFI across various environments, written QA scripts in Python for tracking files.
- Involved in importing data from Microsoft SQL Server and Teradata into HDFS using Sqoop.
- Good knowledge in using Apache NIFI to automate the data movement.
- Developed Sqoop scripts to import data from relational sources and handled incremental loading.
- Extensively used Stream Sets Data Collector to create ETL pipeline for pulling the data from RDBMS system to HDFS.
- Implemented the data processing framework using Scala and Spark SQL.
- Worked on implementing the performance optimization methods to improve the data processing timing.
- Experienced in creating the shell scripts and made jobs automated.
- Extensively worked on Data frames and Datasets using Spark and Spark SQL.
- Responsible for defining the data flow within Hadoop eco system and direct the team in implement them and exported the result set from Hive to MySQL using Shell scripts.
- Developed a generic utility in Spark for pulling the data from RDBMS system using multiple parallel connections.
- Integrated existing code logic in HiveQL and implemented in the Spark application for data processing.
- Extensively used Hive/Spark optimization techniques like Partitioning, Bucketing, Map Join, parallel execution, Broadcast join and Repartitioning.
Environment: Spark, Scala, Hive, Hue, UNIX Scripting, Spark SQL, Stream sets, Impala, Beeline, Kafka, Git, Tidal.
Confidential, Overland Park, KS
- Worked on the large-scale Hadoop Yarn cluster for distributed data processing analyzing using Spark, Hive, and HBase.
- Involved in creating data lake by extracting customer's data from various data sources to HDFS which include data from csv, databases, and log data from servers.
- Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
- Used various spark Transformations and Actions for cleansing the input data.
- Developed shell scripts to generate the hive create statements from the data and load the data into the table.
- Wrote Map Reduce jobs using Java API and Pig Latin.
- Loaded data into Hive using Sqoop and used Hive QL to analyze the partitioned and bucketed data, executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic.
- Developed Spark applications by using Scala and Python and implemented Apache Spark for data processing from various streaming sources.
- Importing data from SQL Server to HDFS using python based on Sqoop framework.
- Exporting data from HDFS to MYSQL using python based on Hawq framework.
- Developed java applications that parses the mainframe report and put into CSV Files and another application will compare the data from SQL server and mainframe report and generates a rip file.
- Documented the technical design and also production support document.
- Involved in creating workflow for the Tidal (Workflow coordinator for Waddell & Reed).
- Created Hive external table with partitions and bucketing to load incremental data coming from SQL server.
- Optimized MapReduce jobs to use HDFS efficiently by using various compression mechanisms
- Creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce
- Responsible for performing extensive data validation using Hive
- Implemented Partitioning, Dynamic Partitions and Bucketing in Hive for efficient data access
- Used Oozie workflow engine to run multiple Hive and Pig jobs
- Involved in installing and configuring Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Involved in designing and developing nontrivial ETL processes within Hadoop using tools like Pig, Sqoop, Flume, and Oozie
- Used DML statements to perform different operations on Hive Tables
- Developed Hive queries for creating foundation tables from stage data
- Used Pig as ETL tool to do Transformations, event joins, filter and some preaggregations
- Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
- Developed Hive custom Java UDF’s for further transformations on data.
- Done performance tuning in the hive at all point of phases.
- Involved in modifying existing sqoop and Hawq frameworks to read json property file and perform load/unload operations from HDFS to MYSQL.
- Developed Pig scripts to validate the count of data between Sqoop and Hawq loads.
- Developed java Map Reduce custom counters to track the records that are processed by map reduce job.
Confidential, Hoffman Estates, IL
- Configured and implemented Flume for efficiently collecting, aggregating and moving large amounts of data to HDFS.
- Developed Flume interceptors for preprocessing of application logs before they are loaded into HDFS.
- Developed multiple Map Reduce programs for cleaning the data and preprocessing for downstream analytics.
- Performed ETL using Pig, Hive and MapReduce to transform transactional data to de-normalized form.
- Configured periodic incremental imports of data from DB2 into HDFS using Sqoop.
- Worked extensively with importing metadata into Hive using Sqoop and migrated existing tables and applications to work on Hive.
- Used Compression techniques (Snappy, Gzip) to optimize the Map Reduce jobs to use HDFS efficiently.
- Developed Hive Queries for data analysis by extending it features by writing Custom UDF’s and SerDe’s.
- Created partitioned, bucketed Hive tables, loaded data into respective partitions at runtime, for quick downstream access.
- Involved in configuring the Solr index pipelines to enable real time indexing for our recommendation engine.
- Bundled multiple independent jobs into runnable Oozie workflow, wrapping them together as one triggerable process.
- Implemented POC to load data into Cassandra and access data using Java API.
- Used Cassandra storage API’ s to access, analyze and store data from/to the Cassandra data store.
Environment: Hadoop 2.0, HDFS, MapReduce, Sqoop, Oozie, Pig, Hive, Flume, Ubuntu, Java, Eclipse, XML,JSON, SerDe’s, Custom UDF’s, MR Unit, Cassandra.
Hadoop Developer Hyderabad, India
- Developed different MapReduce applications on Hadoop.
- Mining the location of users on social media sites in semi supervised environment on Hadoop cluster using Map Reduce.
- Implementing single source shortest path on Hadoop cluster.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Evaluated suitability of Hadoop and its ecosystem to the above project and implemented various proof of concept (POC) applications to eventually adopt them to benefit from the Big Data Hadoop initiative.
- Estimated Software & Hardware requirements for the Name Node and Data Node & planning the cluster.
- Participated in requirement gathering from the Experts and Business Partners and converting the requirements into technical specifications.
- Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into HBase.
- Written the Map Reduce programs, Hive UDFs in Java where the functionality is too complex.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Involved in loading data from LINUX file system to HDFS.
- Prepared design documents and functional documents.
- Based on the requirements, addition of extra nodes to the cluster to make it scalable.
- Developed HIVE queries for the analysis, to categorize different items.
- Assisted application teams in installing Hadoop updates, operating system, patches and version upgrades when required.
- Designing and creating Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets.
- Given POC of FLUME to handle the real time log processing for attribution reports.
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
Environment: Java, Apache Hadoop, HDFS, MapReduce, Pig, Hive, LINUX, Sqoop, Flume, Oozie. Cassandra.
Database and ETL Developer
- Designed and developed the interface program to Import item master using SQL*Loader, PL/SQL package and Import through Concurrent program.
- Designing and customizing data models for Data warehouse supporting data from multiple sources on real time.
- Designed/modified/implemented stored procedures, triggers in Oracle 8.0.6 using PL/SQL.
- Increased performance, speed, and error handling of process by 60%.
- Analyzed and used Constellar hub to ETL source data for data warehousing
- Designed, developed, coded, tested, documented, and implemented data modeling features using TOAD and other third party tools.
- Written SQL Scripts and PL/SQL Scripts to extract data from Database and for Testing Purposes.
- Created primary Database storage structures (Table Spaces, Segment, Extent, Data files, Data blocks) and objects (Tables, Views, and Indexes).
- Oversaw troubleshooting UNIX macros, SQL scripts, and SQR reports.
Environment: Oracle 8i/9i/10g, SQL Server 2000,PL/SQL, SQL * Loader, TOAD, UNIX Shell Scripting, MS SQL server 2000, MSBI, Windows NT 4.0.
- Involved in analysis and design phase of Software Development Life cycle (SDLC).
- Involved in the design by preparing UML diagrams using Microsoft Visio tool.
- Created POJO layer to facilitate the sharing of data between the front end and the J2EE business objects
- Used server side Spring framework and Hibernate for Object Relational Mapping of the database structure created in Oracle,
- Involved in Hibernate configuration properties setup & mapping files using generators, associations & inheritance etc.
- Developed web services by using Restful API.
- Implemented Message Driven beans to develop the asynchronous mechanism to invoke the provisioning system when a new service request saved in the database used JSM for this.
- Transformed XML documents using XSL.
- Used GOF Java & J2EE design patterns. Used Business Delegate to maintain decoupling between presentation & business layer.
- Agile Software Development model used for this project.
- Used HTML, XHTML, DHTML, Java Script, AJAX, JQUERY, JSP and Tag Libraries to develop UI/view pages.
- Used Spring Core to define beans for Services, Entity services and corresponding depended services.
- Involved in spring programmatic transaction management using AOP and using Spring Batch.
- Implemented Batch framework for insurance records processing.
- Used Apache CXF, WSDL, SOAP, AXIS and JAX-WS technologies to create web services, generate stubs and consuming the existing services.
- Involved in developing Restful web services using JERSEY tool as wrappers to access existing business services by Mobile channel.
- Used JMS to pass messages as payload to track statuses, milestones and states in the workflows.
- Extensively used GOF Java and J2EE design patterns.
Environment: JAVA/J2EE, HTML, JS, AJAX, Servlets, JSP, XML, XSLT, XPATH, XQuery, WSDL, SOAP, REST, JAX-RS, JERSEY, JAX-WS, JNDI, Spring framework - DI, AOP, Batch, Hibernate.
- Developed Database Applications in MS Access and SQL Server accelerating insurance claims processing.
- Proficient in implementing complex SQL/Access queries, Schema Designing, Normalization, and Performance Tuning.
- Developed Processes for ETL.
- Created technical documentation and documented processes on several projects.
- Performed Database Backup, Recovery and Disaster Recovery procedures.
- Performed data migration using DTS services across different databases, including MS Access, Excel and flat files.
- Analyzed and design database as well as business logic modules
- Processed and documented the project
- Installed and configured relevant components to ensure database access
Environment: SQL Server, MSBI(SSIS), MS Access, Excel and Windows 2000 professional.