- 8+ years of experience in IT industry, which includes hands on experience in Bigdata ecosystem related technologies.
- Possesses 4+ years of rich Hadoop experience in design and development of Big Data applications, which involves Apache Hadoop Map/Reduce, HDFS, Hive, HBase, Pig, Oozie, Sqoop, Kafka, Flume and Spark.
- Expertise in developing solutions around NOSQL databases like MongoDB and Cassandra.
- Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Hortonworks Distribution (HDP2.X).
- Experience wif all flavor of Hadoop distributions, including Cloudera, Horton works.
- Excellent understanding of Hadoop architecture Map Reduce MRv1 and Map Reduce MRv2 (YARN).
- Developed multiple Map Reduce programs to process large volumes of semi/unstructured data files using different Map Reduce design patterns.
- Hands on experience in working wif Ecosystems consisting Hive, Pig, Sqoop, Map Reduce, Flume, and Oozie.
- Worked extensively over semi - structured data (fixed length & delimited files), for data sanitation, report generation and standardization.
- Expertise in different data loading techniques (Flume, Sqoop) onto HDFS.
- Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
- Experience in handling continuous streaming data using Flume and memory channels.
- Expertise in Hadoop administration such as managing cluster, reviewing Hadoop log files.
- Extensive Experience in Developing and maintaining Big Data streaming applications using Kafka, Storm, Spark and other Hadoop Components
- Good Experience in writing complex SQL queries wif databases like Oracle 10g, MySQL, and SQL Server
- The concepts of Objects, Classes and their relationships and how to model them and good hands on experience on Spring 2.5 framework
- Knowledge in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Expertise in using ETL tool Informatica to Extract,Transform and Load the data into warehouse.
- Hands on experience wif Spark using Scala and Python.
- Hands on experience in Spark architecture and its integrations like Spark SQL, Data Frames and Datasets API.
- Made POC on Spark Real Time Streaming using Kafka into HDFS.
- Hands-on experience wif AWS (Amazon Web Services), using Elastic MapReduce (EMR), creating and Storing data in S3 buckets and creating Elastic Load Balancers (ELB) for Hadoop front end Web UI’s.
- Extensive noledge on creating Hadoop cluster on multiple EC2 instances in AWS and configuring themthrough Ambari and using IAM (Identity and Access Management) for creating groups, users.
- Extensively worked wif object oriented Analysis, Design and development of software using UML methdolgy.
- Exposed into methodologies like scrum, agile and waterfall.
- Good noledge of Normalization, Fact Tables and Dimension Tables, also dealing wif OLAP and OLTP systems.
- Strong Experience in Unit Testing and System testing in BigData.
- Experience working on Version control tools like SVN and GIT revision control systems such as GitHub and JIRA to track issues and crucible for code reviews.
Confidential, Chicago, IL
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Developed Sqoop scripts to import export data from relational sources and handled incremental loading on the customer, transaction data by date.
- Migrated existing java application into Microservices using spring boot and spring cloud..
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Perform functional and performing testing of solutions.
- Maintain documentation of production schedules, production run-books, and assist in documenting operational best practices.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Experienced wif batch processing of data sources using Apache Spark and Elastic search.
- Experienced in implementing Spark RDD transformations, actions to implement business analysis
- Migrated Hive QL queries on structured into Spark QL to improve performance
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Worked on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
- Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, JSON, CSV formats.
- Responsible for analyzing and cleansing raw data by performing Hive/Impala queries and running Pig scripts on data.
- Developed a script in Scala to read all the Parquet Tables in a Database and parse them as Json files, another script to parse them as structured tables in Hive.
- Configured Zookeeper for Cluster co-ordination services.
- Developed a unit test script to read a Parquet file for testing Pypark on the cluster.
- Administration, installing, upgrading and managing distributions ofHadoop, Hive, HBase.
- Involved in performance of troubleshooting and tuningHadoopclusters.
- Implemented business logic by writing Hive UDFs in Java.
- Wrote XML scripts to build OOZIE functionality.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Extensively worked on creating End-End data pipeline orchestration using Oozie.
- Involved in exploration of new technologies like AWS, Apache Flink, and Apache NIFIetc which can increase the business value.
Environment: Map Reduce, HDFS, Spring Boot, Microservices, AWS, Hive, Pig, SQL, Sqoop, Oozie,pyspark,AWS, Shell scripting, Cron Jobs, Apache Kafka, J2EE.
Confidential, River side, CA
- Responsible for Writing MapReduce jobs to perform operations like copying data on HDFS and defining job flows on EC2 server, load and transform large sets of structured, semi-structured and unstructured data.
- Developed a process for Scooping data from multiple sources like SQL Server, Oracle and Teradata.
- Responsible for creation of mapping document from source fields to destination fields mapping.
- Developed a shell script to create staging, landing tables wif the same schema like the source and generate the properties which are used by Oozie jobs.
- Developed Oozie workflow’s for executing Sqoop and Hive actions.
- Worked wif NoSQL databases like HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
- Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
- Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow.
- Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table which specifies the execution times of each job.
- Developed Hive scripts for performing transformation logic and also loading the data from staging zone to final landing zone.
- Worked on Parquet File format to get a better storage and performance for publish tables.
- Involved in loading transactional data into HDFS using Flume for Fraud Analysis.
- Developed Python utility to validate HDFS tables wif source tables.
- Designed and developed UDF’S to extend the functionality in both PIG and HIVE.
- Import and Export of data using Sqoop between MySQL to HDFS on regular basis.
- Responsible for Developing multiple kafka Producers and Consumers from sratch as per the software requirement specifications.
- Involved in using CA7 tool to setup dependencies at each level (Table Data, File and Time).
- Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozieworkflows.
- Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
- Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS
- Helped wif the sizing and performance tuning of the Cassandra cluster.
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD's
Environment:HortonworksHDP 2.5, MapReduce, AWS, Cassandra, pyspark, HDFS, Hive, Pig, SQL, Ambari,Cassandra, Sqoop, Flume, Oozie, HBase, Java (jdk 1.6), Eclipse, MySql and Unix/Linux.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Oracledatabase into HDFS using Sqoop.
- Gatheird business requirements, definition and design of the data sourcing and data flows, data quality analysis, working in conjunction wif the data warehouse architect on the development of logical data models.
- Create, develop, modify and maintain Database objects, PL/SQL packages, functions, stored procedures, triggers, views, and materialized views to extract data from different sources.
- Extracted data from various location and load them into the oracle table using SQL*LOADER.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Developed the Pig Latin code for loading, filtering and storing the data.
- Collecting and aggregating large amounts of log data using ApacheFlume and staging data in HDFS for further analysis
- Importing and exporting data into HDFS and Hive using Sqoop.
- Analyzed the data by performing Hive queries and running Pig scripts to no user behavior.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Successfully managed Extraction, Transformation and Loading (ETL) process by pulling large volume of data from various data sources using BCP in staging database from MS Access and excel.
- Was responsible for detecting errors in ETL Operation and rectify them.
- Incorporated Error Redirection during ETL Load in SSIS Packages.
- Implemented various types of SSIS Transformations in Packages including Aggregate, Fuzzy Lookup, Conditional Split, Row Count, Derived Column etc.
- Implemented the Master Child Package Technique to manage big ETL Projects efficiently.
- Involved in Unit testing and System Testing of ETL Process.
Environment: Hadoop, MapReduce, HDFS, Hive, Oracle 11g, Java, Struts, Servlets, HTML, XML, SQL, J2EE, JUnit, Tomcat 6, MSSQLServer 2005/08,SQLSSIS
Confidential, Boston MA
- Responsible for the analysis, documenting the requirements and architecting the application based on J2EE standards. Followed test-driven development (TDD) and participated in scrum status reports.
- Provided full SDLC application.
- Development services including design, integrate, test, and deploy enterprise mission-critical billing solutions.
- Participated in designing of Use Case, Class Diagram and Sequence Diagram for various Engine components and used IBM Rational Rose for generating the UML notations.
- Developing Ant, Maven and ShellScripts to automatically compile, package, deploy and test J2EE applications to a variety of Web Sphere platforms.
- Experience in developing Business Applications using JBoss, Web Sphere and Tomcat.
- Perl scripting, shell scripting and PL/SQL programming to resolve business problems of various natures.
- Client side validations and server side validations are done according to the business needs. Written test cases and done Unit testing and written executing Junit tests.
- Written ANT Scripts for project build in LINUX environment.
- Involved in Production implantation and post production support.
- Involved in complete software development life cycle management using UML.
- Coding interfaces for Web Services
- Application was developed using SpringMVC Web flow modules
- Implemented spring framework for application transaction management
- Created connections to database using Hibernate Session Factory, using Hibernate APIs to retrieve and store data to the database wif Hibernate transaction control.
- Involved in the development of Page Flows, Business Objects, Hibernate database mappings and POJOs. Used xml style syntax for defining object relational metadata.
- Used Spring ORM to integrate Spring Framework wif hibernate and JPA.
Environment:Java7, Spring Framework 3.0, Hibernate, Java 1.6, DHTML, HTML, CSS, Servlets, UML, J2EE, JSP, EJB, Struts Framework Tallies, SQL.
- Developed the various action classes to handle the requests and responses.
- Involved in the design of the Referential Data Service module to interface wif various databases using JDBC.
- Used Hibernate framework to persist the employee work hours to the database.
- Developed classes and interface wif underlying web services layer.
- Prepared documentation and participated in preparing user's manual for the application.
- Prepared Use Cases, Business Process Models and Data flow diagrams, User Interface models.
- Gatheird & analyzed requirements for Auto, designed process flow diagrams.
- Defined business processes related to the project and provided technical direction to development workgroup.
- Analyzed the legacy and the Financial Data Warehouse.
- Participated in Data base design sessions, Database normalization meetings.
- Managed Change Request Management and Defect Management.
- Managed UAT testing and developed test strategies, test plans, reviewed QA test plans for appropriate test coverage.
- Involved in Developing JSP's, action classes, form beans, response beans, EJB's.
- Extensively used XML to code configuration files.
- Developed PL/SQL stored procedures, triggers.
- Performed functional, integration, system and validation testing.