- Professional Software developer with around 6 years of technical expertise in all phases of Software development life cycle (SDLC), expertizing in BigData Technologies like Hadoop and Spark Ecosystem.
- 3+ years of industrial experience in Big Data analytics, Data manipulation, using Hadoop Ecosystem MapReduce, HDFS, Yarn/MRv2, Pig, Hive, HBase, Spark, Kafka, Flume, Oozie, Sqoop, AWS, NiFi and Zookeeper.
- Hands on expertise in working and designing of Row keys & Schema Design with NoSQL databases like HBase.
- Extensively worked on Spark using Scala on cluster for computational (analytics), On top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL.
- Excellent Programming skills at a higher level of abstraction using Scala and Python.
- Hands on experience in developing SPARK applications using Spark libraries like Spark core, Spark MLlib, Spark Streaming and Spark SQL.
- Strong experience on real time data analytics using Spark Streaming, Kafka and NiFi.
- Working knowledge of Amazon’s Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service(S3) as Storage mechanism.
- Created Hive tables to store structured data into HDFS and processed it using HiveQL.
- Worked on GUI Based Hive Interaction tools like Hue, Hive View for querying data.
- Experience with Talend Data Management Platform & Talend Enterprise Big Data 6.4.1
- Extensive experience in ETL methodology for performing Data Profiling, Data Migration, Extraction, Transformation and loading using Talend and designed data conversions from wide variety of source systems.
- Extensively used talend Big data components like tHDFSInput, tHDFSOutput, tHiveLoad, tHiveInput, tHbaseInput, tHbaseOutput, tSqoopImport and tSqoopExport.
- Extensively created mappings in Talend using tMap, tJoin, tReplicate, tParallelize, tJava, tJavarow, tAggregateRow, tWarn, tLogCatcher, tMysqlScd, tFilter, tGlobalmap etc.
- Experienced in migrating form on premise to AWS using AWS Data Pipeline and AWS Firehose.
- Experience writing python script to spin up EMR cluster along with shell scripting.
- Experience in writing Complex SQL queries, PL/SQL, Views, Stored procedure, triggers.
- Experience in OLTP and OLAP design, development, testing and support of Data warehouses.
- Experience working with OLAP, star pattern and snow flake pattern data warehousing.
- Good experience in optimizing MapReduce algorithms using Mappers, Reducers, combiners and partitioners to deliver the best results for the large datasets.
- Had competency in using Chef, Puppet and Ansible configuration and automation tools. Configured and administered CI tools like Jenkins, Hudson Bambino for automated builds.
- Hands on experience in using various Hadoop distros Cloudera, Hortonworks and Amazon EMR.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MapReduce Programming Paradigm, High Availability and YARN architecture.
- Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews and Worked on various version control tools like CVS, GIT, SVN.
- Experienced in checking status of cluster using Cloudera manager, Ambari, Ganglia and Nagios.
- Ability to work with Onsite and Offshore Teams.
- Designed ETL jobs in Talend, to source data from HDFS into RDBMS.
- Experience in writing Shell Scripts in Unix/Linux.
- Good experience with use - case development, with methodologies like Agile and Waterfall.
- Good understanding of all aspects of Unit, Regression, Agile, White & Black-box testing.
- Proven ability to manage all stages of project development Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent Decisions.
Big Data Ecosystem: HDFS, MapReduce, Pig, Hive, Spark, YARN, Kafka, Flume, Sqoop, Impala, Oozie, ZooKeeper, Spark, Ambari, NiFi.
Cloud Environment: AWS, Google Cloud
Hadoop Distributions: Cloudera, Hortonworks
Languages: Python, Shell Scripting, Scala.
NoSQL Databases: MongoDB, HBase, DynamoDB.
Development / Build Tools: Eclipse, Git, IntelliJ and log4J.
RDBMS: Oracle 10g,11i, MS SQL Server, DB2
Testing: MRUnit Testing, Quality Center (QC)
Virtualization: VMWare, Docker, AWS/EC2, Google Compute Engine, Vagrant.
Build Tools: Maven, Ant, Gradle
- Experience in implementing Scala framework code using IntelliJ and UNIX scripting to implement the workflow for the jobs.
- Involved in gathering business requirement, analyze the use case and implement the use case end to end.
- Worked closely with the Architect; enhanced and optimized product Spark and Scala code to aggregate, group and run data mining tasks using Spark framework.
- Experienced in loading the raw data into RDDs and validate the data.
- Experienced in converting the validated RDDs into Data frames for further processing.
- Implemented the Spark SQL code logic to join multiple data frames to generate application specific aggregated results.
- Experienced in fine tuning the jobs for better performance in the production cluster space.
- Worked totally in Agile methodologies, used Rally scrum tool to track the User stories and Team performance.
- Worked extensively in Impala Hue to analyse the processed data and to generate the end reports.
- Experienced working with hive database through beeline.
- Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
- Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDDs.
- Experience in developing Docker images and deploying Docker containers in swarm.
- Implemented Elastic Search on Hive data warehouse platform.
- Experience in using Avro, Parquet and JSON file formats, developed UDFs in Hive.
- Worked with Log4j framework for logging debug, info & error data.
- Used Amazon DynamoDB to gather and track the event-based metrics.
- Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
- Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.
- Written several MapReduce Jobs using Java API, Jenkins for Continuous integration.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Worked with different teams to ensure data quality and availability.
- Responsible for generating actionable insights from complex data to drive real business results for various applications teams and worked in Agile Methodology projects extensively.
- Worked with SCRUM team in delivering agreed user stories on time for every Sprint.
- Worked on analysing and resolving the production job failures in several scenarios.
- Implemented UNIX scripts to define the use case workflow and to process the data files and automate the jobs.
- Knowledge on implementing the JILs to automate the jobs in production cluster.
Environment: Spark, Spark-Streaming, Spark SQL, Redshift, Python, DynamoDB, HDFS, Hive, Pig, Apache Kafka, Sqoop, Scala, Shell scripting, Linux, Jenkins, Eclipse, Git, Oozie, Talend, Soap and Agile Methodology, Nagios.
Hadoop Devops Engineer
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python.
- Developed Spark jobs using Scala on top of Yarn/MRv2 for interactive and Batch Analysis.
- Experienced in querying data using Spark SQL on top of Spark engine for faster data sets processing.
- Using Talend big data components like Hadoop and S3 Buckets and AWS Services for redshift
- Utilized Big Data components like tHDFSInput, tHDFSOutput, tHiveLoad, tHiveInput, tHbaseInput, tHbaseOutput, tSqoopImport and tSqoopExport.
- Designed and Implemented the ETL process using Talend Enterprise Big Data Edition to load the data from Source to Target Database.
- Extensive use of Elastic Load Balancing mechanism with Auto Scaling feature to scale the capacity of EC2 Instances across multiple availability zones in a region to distribute incoming high traffic for the application with zero downtime.
- Created Partitioned Hive tables and worked on them using HiveQL.
- Experience in monitoring Hadoop cluster using Cloudera Manager, interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
- Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
- Worked on continuous Integration tools Jenkins and automated jar files at end of day.
- Developed Unix shell scripts to load large number of files into HDFS from Linux File System.
- Experience in setting up app stack and debug Logstash to send Apache logs to AWS Elastic search.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Used Impala connectivity from the User Interface (UI) and query the results using Impala SQL.
- Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
- Worked in Agile development environment having KANBAN methodology. Actively involved in daily Scrum and other design related meetings.
- Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
- Supported in setting up QA environment and updating for implementing scripts with Pig, Hive and Sqoop.
Environment: Hadoop, HDFS, Hive, MapReduce, AWS Ec2, Impala, Sqoop, Spark, SQL Talend, Python, PySpark, Yarn, Pig, Oozie, Linux-Ubuntu, Scala, Tableau, Maven, Jenkins, Cloudera, JUnit, agile methodology.
Big Data Hadoop Consultant
- Experienced in migrating and transforming of large sets of Structured, semi structured and Unstructured RAW data from HBase through Sqoop and placed in HDFS for processing.
- Written multiple MapReduce programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV.
- Wrote Flume configuration files for importing streaming log data into HBase with Flume.
- Loading Data into HBase using Bulk Load and Non-bulk load.
- Written Java program to retrieve data from HDFS and providing it to REST Services.
- Sqoop for large data transfers from RDBMS to HDFS/HBase/Hive and vice-versa.
- Implemented partitioning, bucketing in Hive for better organization of the data.
- Involved in using HCatalog to access Hive table metadata from MapReduce or Pig code.
- Created HBase tables, HBase sinks and loaded data into them to perform analytics using Tableau.
- Installed, configured and maintained Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Created multiple Hive tables, running hive queries in those data, implemented Partitioning, Dynamic Partitioning and Buckets in Hive for efficient data access.
- Experience in creating tables, dropping and altering at run time without blocking using HBase and Hive.
- Experienced in running batch processes using Pig Latin Scripts and developed Pig UDFs for data manipulation according to Business Requirements
- Hands on experience in Developing optimal strategies for distributing the web log data over the cluster, importing and exporting of stored web log data into HDFS and Hive using Sqoop.
- Continuously monitored and managed the Hadoop cluster using Cloudera manager and Web UI.
- Managed and scheduled several jobs to run over a time on Hadoop cluster using Oozie.
- Used MAVEN for building jar files of MapReduce programs and deployed to cluster.
- Involved in final reporting data using Tableau for testing by connecting to the corresponding Hive tables using Hive ODBC connector.
- Performed Cluster tasks like adding, removing of nodes without any effect on running jobs.
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
- Helped in design of Scalable Big Data Clusters and solutions and involved in defect meetings.
- Followed Agile Methodology for entire project and supported testing teams.
Environment: Apache Hadoop, MapReduce, HDFS, HBase, CentOS 6.4, Unix, REST web Services, Hive, Pig, Oozie, JSON, Eclipse, QlikView, Qlik Sense, Jenkins, Maven, Sqoop.
- Involved in Design, Development and Support phases of Software Development Life Cycle (SDLC)
- Reviewed the functional, design, source code and test specifications
- Involved in developing the complete front-end development using Java Script and CSS
- Author for Functional, Design and Test Specifications.
- Developed web components using JSP, Servlets and JDBC
- Designed tables and indexes
- Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server
- Developed stored procedures, packages and database triggers to enforce data integrity. Performed data analysis and created crystal reports for user requirements
- Implemented Backend, Configuration DAO, XML generation modules of DIS
- Analyzed, designed and developed the component
- Used JDBC for database access
- Used Spring Framework for developing the application and used JDBC to map to Oracle database.
- Used Data Transfer Object (DTO) design patterns
- Unit testing and rigorous integration testing of the whole application
- Written and executed the Test Scripts using JUNIT
Environment: JSP, XML, Spring Framework, Hibernate, Eclipse (IDE), Micro Services, Java Script, Struts, Tiles, Ant, PL/SQL, Windows, UNIX, Soap, Jasper reports.