- 7 years of extensive IT experience in steering projects from inception to delivery, passionate for turning data into products, actionable insights, and meaningful stories.
- Strong experience working with Apache Hadoop ecosystem, Apache Spark and AWS.
- Excellent understanding/knowledge of Hadoop Ecosystem including HDFS, MapReduce, Hive, Pig, Spark, Kafka, Impala, YARN, HBase, Oozie, ZooKeeper, Flume, and Sqoop based Big Data Platforms.
- Experienced with NoSQL databases like HBase and Cassandra.
- Worked with Big Data distributions Cloudera CDH5, CDH4, CDH3, and Hortonworks.
- Comprehensive experience in building Web - based applications using J2EE frameworks like EJB, Structs.
- Excellent ability to use analytical tools to mine data and evaluate the underlying patterns.
- Assisted in Cluster maintenance, Cluster Monitoring, Managing and Reviewing data backups and log files.
- Hands on experience in developing MapReduce programs in using Apache Hadoop for analyzing the Big Data
- Expertise in optimizing traffic across the network using Combiners, joining multiple schema datasets using Joins and organizing data using Partitioners and Buckets.
- Expertise in composing MapReduce Pipelines with many user-defined functions using Apache PIG.
- Implemented business logic by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources HIVE.
- Expertise in Hive Query Language (HiveQL), Hive Security and debugging Hive issues.
- Responsible for performing extensive data validation using HIVE Dynamic Partitioning and Bucketing.
- Experience in d eveloping custom UDFs for Pig and Hive to in corporate methods and functionality of Java into Pig Latin and HQL (Hive QL).
- Worked on a different set of tables like External Tables and Managed Tables.
- Hands-on experience with data ingestion tools like Sqoop, Kafka, Flume and workflow management tools Oozie.
- Hands-on experience handling different file formats like JSON, AVRO, ORC and Parquet and compression techniques like snappy.
- Experience in scheduling MapReduce/Hive jobs using Oozie.
- Experience in ingesting large volumes of data into Hadoop using Sqoop.
- Analyzed the data by performing Hive queries and used HIVE UDF's for complex querying NoSQL.
- Experience in writing real-time query processing using Cloudera Impala.
- Acted as SME and Module Lead for the major projects undertaken.
- Worked with Apache Spark for quick analytics on object relationships.
- Implemented Spark using Scala and SparkSQL for faster testing and processing of data
- Managed and reviewed Hadoop log files. Used Scala for integration Spark into Hadoop.
- Developing and writing PIG scripts and Loaded the data from RDBMS SERVER to Hive using Sqoop.
- Analysed the SQL scripts and designed the solution to implement using Scala.
- Extensive knowledge of RDD transformations, DataFrame transformations in Spark.
- Experience in building clusters on AWS using Amazon EC2 services and Cloudera manager.
- Experience in Big Data platforms like Hortonworks, Cloudera, Amazon AWS, and Apache.
- Complete domain and development Life Cycle knowledge of Data Warehousing & Client Server Concepts and knowledge of basic data modeling.
- Extensively used Scala, Spark improving the performance and optimization of the existing algorithms/queries in Hadoop and Hive using Spark Context, Spark-SQL (Data Frames and Datasets) and Pair RDD's.
- Successfully working in fast-paced collaborative environment, being a smooth team player with excellent interpersonal skills. Exceptional ability to quickly master new concepts and technologies.
- Seasoned in Agile-Scrum methodologies with a focus on perfecting quality and improving data availability.
Big Data Ecosystems: HDFS, MapReduce, Pig, Hive, HBase, Sqoop, Impala, Oozie, Drill, Kylin, Zookeeper, Flume, Kafka, Kylo, Elastic search, Yarn, and Spark.
Programming Languages: C, C#, Java, Scala.
Web Technologies:: Java Script, JSP, J2EE, JDBC.
Databases: Microsoft SQL Server, MySQL, Oracle
NoSQL:: HBase, Cassandra, MongoDB
Scripting Languages: PHP, HTML, Python, Shell Scripting.
Tools: Eclipse, IntelliJ IDEA, Maven, and SBT.
Version Control Tools: SVN, Git, GitHub
Platforms:: Windows, Linux, Centos, and MacOS.
Hadoop Distributions: Cloudera, Hortonworks
Confidential - Westfield Center, OH
- Exploring with Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame, pair RDD's, Spark YARN.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Developed a data pipeline using Kafka and Strom to store data into HDFS.
- Installed/Configured /Maintained Hortonworks Hadoop clusters for application development.
- Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
- Responsible for building scalable distributed data solutions using Hadoop cluster environment with Hortonworks distribution on 110 data nodes.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Developed and executed shell scripts to automate the jobs and Wrote complex Hive queries and UDFs.
- Worked on reading multiple data formats on HDFS using Spark.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python, and Scala.
- Developed multiple POCs using Spark and deployed on the Yarn cluster, compared the performance of spark, with Hive and SQL/Teradata.
- Analyzed the SQL scripts and designed the solution to implement using Spark.
- Involved in loading data from UNIX file system to HDFS, AWS S3.
- Extracted the data from Teradata into HDFS using Sqoop.
- Handled importing of data from various data sources like AWS S3, Cassandra performed transformations using Hive, MapReduce, Spark and load data into HDFS.
- Manage and review Hadoop log files.
- Developed Java Mapper and Reducer programs for complex business requirements.
- Used different data formats (Text format and Avro format) while loading the data into HDFS.
- Performed complex HiveQL queries on Hive tables and Created custom user-defined functions in Hive.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries.
- Created partitioned tables and loaded data using both static partition and dynamic partition method.
- Performed incremental data movement to Hadoop using Sqoop.
- Involved in analysis, design, testing phases and responsible for documenting technical specifications.
- Developed Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive using AWS EMR.
- Worked on the core and Spark SQL modules of Spark extensively.
- Experienced in running Hadoop streaming jobs to process terabytes data from AWS S3.
- Involved in importing the real-time data to Hadoop using Kafka and implemented Oozie job for daily imports.
Environment: Hadoop, HDFS, Hive, Python, Scala, Spark, SQL, Cassandra, MariaDB, UNIX Shell Scripting, AWS S3, EMR, EC2, Hortonworks HDP 2.4.
Confidential - Jacksonville, FL
- Involved in loading data from LINUX file system to HDFS.
- Implemented test scripts to support test driven development and continuous integration.
- Responsible to manage data coming from different sources.
- Load and transform large sets of structured, semi-structured and unstructured data.
- Designing conceptual model with Spark for performance optimization.
- Experience in managing and reviewing Hadoop log files, managing and scheduling Jobs on a Hadoop cluster.
- Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
- Implement Partitioning, Dynamic Partitions, Buckets in HIVE. Responsible to manage data coming from different sources.
- Monitor the running MapReduce programs on the cluster. Responsible for loading data from UNIX file systems to HDFS.
- Install and configure Hive and make use of Hive UDFs.
- Implement the workflows using Apache Oozie framework to automate tasks.
- Manage IT and business stakeholders, conduct assessment interviews, solution review sessions.
- Experience in agile Programming and accomplishing the tasks to meet deadlines.
- Used Pig as ETL tool to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS
- Written Hive queries for data analysis to meet the business requirements
- Involved in writing Hive scripts to extract, transform and load the data into Database.
- Used JIRA for bug tracking and used Git for version control.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows
- Designed and built the Reporting Application, which uses the Spark SQL to fetch and generate reports on HBase table data.
- Extracted the needed data from the server into HDFS and Bulk Loaded the cleaned data into HBase.
- Review the code developed and suggests any issues with respect to customer data.
- Use SQL queries and other tools to perform data analysis and profiling.
- Mentor and train the engineering team in use of Hadoop platform and analytical software, development technologies.
Environment: Hadoop, Hive, Linux, MapReduce, HDFS, Hive, Pig, Sqoop, Shell Scripting, Java (JDK 1.6), Java 6, Eclipse, Control-M scheduler, Oracle 10g, PL/SQL, SQL*PLUS, Toad 9.6, Linux, JIRA 5.1, CVS, JIRA 5.2.
Confidential - Austin, TX
- Worked on requirement gathering, analysis, and translate business requirements into the technical design with Hadoop Ecosystem.
- Worked collaboratively to manage build outs of large data clusters and real-time streaming.
- Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle and MySQL.
- Developed various complex Hive Quires as per business logic.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive queries.
- Maintained Bit bucket repositories for DevOps environment: automation code and configuration.
- Created Technical document for integration between Eclipse and Bitbucket, and Source Tree.
- Created JIRA projects integrating workflows, screen schemes, field configuration schemes, permission schemes, project roles, and notification schemes.
- Experienced in Agile processes and delivered quality solutions in regular sprints.
- Responsible for handling Streaming data from web server console logs.
- Designed docs and specs for the near real-time data analytics using Hadoop and HBase.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Responsible for performing extensive data validation using Hive.
- Moving data from HDFS to RDBMS and vice-versa using SQOOP.
- Work with cross-functional consulting teams within the data science and analytics team to design, develop, and execute solutions to derive business insights and solve clients' operational and strategic problems.
- Monitoring the ticketing tool for any tickets indicating an issue/incident reported and resolving with the appropriate fix in the project.
- Developed a fully automated continuous integration system using Git, Bitbucket, MySQL and custom tools developed in Python.
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets.
- Used Cloud watch logs to move app logs to S3. Create alarms based on exceptions raised by applications.
- Experienced in writing HIVE JOIN Queries.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries.
- Continuous monitoring and managing the Hadoop/spark cluster using Cloudera Manager.
- Experience in developing custom UDF's for Hive.
Environment: Java, Scala 2.10.5, Spring 3.0.4, Hive, HDFS, YARN, MapReduce, Sqoop 1.4.3, Flume, UNIX Shell Scripting, Python 2.6, Azure, AWS, Kafka, Bitbucket, Jira, Oracle 11g, Hadoop
Confidential - Minneapolis, MN
- Responsible for building scalable distributed data solutions using Hadoop.
- Wrote multiple Map Reduce programs in Java for Data Analysis.
- Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files.
- Developed pig scripts for analyzing large data sets in the HDFS.
- Collected the logs from the physical machines and the OpenStack controller and integrated into HDFS using Flume.
- Designed and presented a plan on impala. Experienced in migrating Hive QL into Impala to minimize query response time.
- Implemented Avro and parquet data formats for Apache Hive computations to handle custom business requirements.
- Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
- Implemented Daily jobs that automate parallel tasks of loading the data into HDFS using autosys and Oozie coordinator jobs.
- Performed streaming of data into Apache ignite by setting up cache for efficient data analysis.
- Responsible for performing extensive data validation using Hive.
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Involved in submitting and tracking Map Reduce jobs using Job Tracker.
- Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
- Responsible for cleansing the data from source systems using Join, De normalize, Normalize, Reformat, Filter-by-Expression, Rollup.
- Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and ther sources
- Implemented Hive Generic UDF's to implement business logic.
- Implemented test scripts to support test driven development and continuous integration.
Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Linux, Teradata, Zookeeper, autosys, HBase, Cassandra, Apache Ignite.
- Involved in Analysis, Design, Development, and Testing of the application.
- Incorporated UML diagrams (Class diagrams, Activity diagrams, Sequence diagrams) as part of design documentation and other system documentation.
- Enhanced the Port search functionality by adding a VPN Extension Tab.
- Created end to end functionality for view and edit of VPN Extension details.
- Used AGILE process to develop the application as the process allows faster development as compared to RUP.
- Used Hibernate for persistence framework
- Used Struts MVC framework and WebLogic Application Server in this application.
- Involved in creating DAO’s and used Hibernate for ORM mapping.
- Implemented using Spring Framework for rapid development and ease of maintenance.
- Written procedures, and triggers for validating the consistency of Metadata.
- Written SQL code blocks using cursors for shifting records from various tables based on checks.
- Written Java classes to test UI and Web services through JUnit and JWebUnit.
- Extensively involved in release/deployment related critical activities.
- Performed functional and integration testing and tested the entire application using JUnit and JWebUnit.
- Log4J was used to log both User Interface and Domain Level.
- Delivered presentations and demos on java and j2ee to cross-functional teams, stakeholders.
Environment: JAVA, JSP, servlets, J2EE, EJB, Struts Framework, JDBC, WebLogic Application Server, Hibernate, Spring Framework, Oracle 9i, Unix, Web Services, CVS, Eclipse, JUnit, JWebUnit.
- Developed use case diagrams, object diagrams, class diagrams, and sequence diagrams using UML.
- Implemented User Interface in Model-View-Controller architecture.
- Developed web services using JAX-WS.
- Used CVS as the version control system.
- Supported the application right from Integration tests through System Tests.
- Created unit test cases using JUnit.
- Created Ant build scripts.
Environment : JAVA/J2EE, UML, JSP, XML, XSD, DHTML, CSS, Servlets, Java Script, Web Services, SOAP, WSDL, Maven, JUnit, Oracle10g, Ant, Clear Case, Eclipse 3.1, Log4J