- 10 + years of work experience in IT Industry in Analysis, Design, Development and Maintenance of various software applications mainly in Hadoop - HortonWorks, Cloudera, mapR in industry verticals like Banking, Financial, Pharmacies, Financial Assets, Fixed Income, Equities, Telecom& Health Insurance.
- 4 years of work experience as Hadoop Developer with good knowledge of Hadoop framework, Hadoop distributed file system and parallel processing implementation.
- Experience in Hadoop Ecosystems HDFS, Map Reduce, Hive, Pig, HBase, Sqoop, AWS
- Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduceprogramming paradigm.
- Hands on experience in Import/Export of data using Hadoop Data Management tool SQOOP.
- Strong experience in writing Map Reduce programs for Data Analysis. Hands on experience in writing custom practitioners for Map Reduce.
- Performed data analysis using Hive and Pig.
- Expert in understanding the data and designing/Implementing the enterprise platforms like Hadoop Data lake and Huge Data warehouses.
- In-depth experience in translating key strategic objectives into actionable and governable roadmaps and designs using best practices and guidelines. Worked on all facets of software development life cycle.
- Excellent understanding of Hadoop Cluster security and implemented secure Hadoop cluster using Kerberos, Knox and Ranger
- Used Unix bash scripts to validate the files from Unix to HDFS file systems.
- Load and transform large sets of structured, semi structured and unstructured data and Manage data coming from different sources.
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like HIVE, Zookeeper, Sqoop, Flume, HUE, Spark on Cloudera & Hortonworks Hadoop Distribution.
- Maintain, support, monitor and upgrade all Hadoop environments including configuration, access control, capacity planning, permissions and security patches to ensure continuity to all Hadoop environments
Sr. Big Data Consultant
Confidential, Los Angeles, CA
Roles & Responsibilities:
- Involving in Design and Architecting of Big Data solutions using Hadoop Eco System.
- Collaborate in identifying the current problems, constraints and root causes with data sets to identify the descriptive and predictive solution with support of the Hadoop HDFS, MapReduce, Pig, Hive, and Hbase and further to develop reports in Tableau.
- Working on analyzing Hadoop cluster using different Bigdata analytic tools including Kafka, Sqoop, Storm, Spark, Pig, Hive and Map Reduce.
- Installing/Configuring/Maintaining Hortonworks Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
- Architect the Hadoop cluster in Pseudo Distributed Mode working with Zookeeper and Apache.
- Storing and loading the data from HDFS to AmazonAWSS3 and backing up and Created tables in AWS cluster with S3 storage.
- Utilizing Big Data technologies for producing technical designs, prepared architectures and blue prints for Big Data implementation and involved in to writing Scala program using spark context.
- Providing technical assistance for configuration, administration and monitoring of Hadoop clusters.
- Involving in loading data from LINUX file system to HDFS and Importing and exporting data into HDFS and Hive using Sqoop and Kafka.
- Implementing Partitioning, Dynamic Partitions, Buckets in Hive and Supported MapReduce Programs those are running on the cluster.
- Preparing presentations of solutions to Big Data/Hadoop business cases and present the same to company directors to get go-ahead on implementation.
- Successfully integrated Hive tables and Mongo DB collections and developed web service that queries Mongo DB collection and gives required data to web UI.
- Installing and configuring Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Using Spark to create API's in JAVA and Scala and real time streaming the data using Spark with Kafka and developed Hive queries, Pig scripts, and Spark SQL queries to analyze large datasets.
- Configuring Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Implementing Storm builder topologies to perform cleansing operations before moving data into Cassandra and transfer data between Azure HDInsight and databases using Sqoop.
- Working on debugging, performance tuning of Hive & Pig Jobs and implemented test scripts to support test driven development and continuous integration.
- Developing enhancements to MongoDB architecture to improve performance and scalability.
- Deploying Algorithms in Scala with Spark, using sample datasets and done Spark based development with Scala.
- Manipulating, cleansing & processing source data and stage it on final hive/redshift tables and involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
Environment: Big Data, Hadoop, HDFS, Pig, Hive, MapReduce, Azure, Sqoop, Spark, Kafka, LINUX, Cassandra, MongoDB, Scala, Storm, Elastic search, SQL, PL/SQL, Scala, AWS, S3, Informatic, Redshift.
Sr. Big Data Consultant
Confidential, Chicago, IL
- Gathered the business requirements from the Business Partners and Subject Matter Experts and involved in installation and configuration of Hadoop Ecosystem components with Hadoop Admin.
- Working on architected solutions that process massive amounts of data on corporate and AWS cloud-based servers.
- Supported MapReduce Programs those are running on the cluster and wrote MapReduce jobs using Java API.
- Configure several node (Amazon EC2 spot Instance) Hadoop cluster to transfer the data from Amazon S3 to HDFS and HDFS to AmazonS3 and also to direct input and output to the Hadoop MapReduce framework.
- Involved in HDFS maintenance and loading of structured and unstructured data and imported data from mainframe dataset to HDFS using Sqoop.
- Developed various automated scripts for DI (Data Ingestion) and DL (Data Loading) using python & java map reduce.
- Handled importing of data from various data sources (i.e. Oracle, DB2, HBase, Cassandra, and MongoDB) to Hadoop, performed transformations using Hive, MapReduce.
- Developed prototype Spark applications using Spark-Core, Spark SQL, Data Frame API and developed several custom User defined functions in Hive & Pig using Java & python
- Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
- Importing the data into Spark from Kafka Consumer group using Spark Streaming APIs.
- Wrote Hivequeries for data analysis to meet the business requirements. Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
- Configured Sparkstreaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Written pythonscripts for internal testing which pushes the data reading form a file into Kafka queue which in turn is consumed by the Storm application.
- Installed and configured Flume, Hive, Pig, Sqoop and Oozie on the Hadoop cluster.
- Participated in building CDH4 test cluster for implementing Kerberos authentication. Upgraded the HadoopCluster from CDH4 to CDH5 and setup High availability Cluster to Integrate the HIVE with existing applications
- Written Spark applications using Scala to interact with the MySQL database using Spark SQL Context and accessed Hive tables using Hive Context.
- Extensively used Spring & Hibernate Frameworks and implemented MVC architecture and worked on Spring RESTful for dependency injection.
- Implemented AWS EC2, Key Pairs, Security Groups, AutoScaling, ELB, SQS, and SNS using AWS API and exposed as the Restful Web services and implemented Reporting, Notification services using AWS API.
- Used AWS (Amazon Web services) compute servers extensively and create Snapshots of EBS Volumes. Monitor AWS EC2 Instances using Cloud Watch.
- Worked on AWS Security Groups and their rules. Worked on Kafka, Kafka-Mirroring to ensure that the data is replicated without any loss.
- Load and transform large sets of structured, semi structured and unstructured data using Hadoop/BigData concepts.
- Involved in migrating Hive queries into Spark transformations using Data frames, Spark SQL, SQL Context, and Scala.
- Design the solution and develop the program for data ingestion using - Sqoop, map reduce, Shell script & python
- Responsible for migrating tables from traditional RDBMS into Hive tables using Sqoop and later generate required visualizations and dashboards using Tableau.
- Worked on documentation of all Extract, Transform and Load, designed, developed, validated and deploy the Talend ETL processes for Data ware house team using PIG, HIVE.
- Developed Spark code using Scala and Spark-SQL for faster testing and processing of data. Implemented scripts for loading data from UNIX file system to HDFS.
- Generate final reporting data using Tableau for testing by connecting to the corresponding Hivetables using HiveODBC connector.
- Implemented Storm builder topologies to perform cleansing operations before moving data into Cassandra.
- Developed fully customized framework using python, shell script, Sqoop & hive and developed export framework using python, Sqoop, Oracle & MySQL.
- Worked with teams in setting up AWS EC2 instances by using different AWS services like S3, EBS, Elastic Load Balancer, and Auto scaling groups, IAM roles, VPC subnets and CloudWatch.
- Implemented Daily Oozie jobs that automate parallel tasks of loading the data into HDFS and pre-processing with Pig using Oozie co-coordinator jobs.
- Involved in importing and exporting data between HDFS and Relational Database Systems like Oracle, MySQL and SQL Server using Sqoop.
- Prototype done with HDPKafka and Storm for click stream application.
- Updated maps, sessions and workflows as a part of ETLchange and also modified existing ETLCode and document the changes.
- Worked on the existing application, wireframes, FDN and BRD documents to get the requirements and analyzed.
- Hands-on Experience with Cassandra to provide Scalability along with NoSQL.
- Developed Agile processes using Groovy, JUnit to use continuous integration.
- Integrated Automated functional tests (Groovy) with Continuous-Integration in Jenkins.
- Parse requests and built response data using Groovy's JSON tools, and Grails web services.
- Imported data from various resources to the Cassandra cluster using Java APIs.
- Used Eclipse SWT for developing the applications.
- Involved in preparation of TSD documents using UML diagrams - Class, Sequence and use case diagrams using Microsoft VISIO tool.
- Wrote RESTful services on the server in NodeJS to listen to requests from devices.
- Built a Grails web application that allows admin users to manage detailed data for all types of Target locations
- Have worked with Standard Widget Toolkit (SWT).
- Conversion of major Openwork's components in to Eclipse RCP/SWT platform along with support of Swing-SWT components.
- Involved in to develop view pages of desktop portal using HTML, Java Script, JSP, Struts Tag libraries, AJAX, JQUERY, GWT, DOJO, XML, and XSLT.
- Developed and deployed Web services to interact with partner interfaces, and client interfaces to consume the web services using CXF, WSDL, SOAP, AXIS and JAX-WS technologies.
- Integrating third party libraries to augment those lacking or inefficient in ExtJS.
- Used RESTful web services using JERSEY tool to develop web services easily and to be invoked by different channels.
- Developed service objects as beans by using Spring IOC/DI.
- Developed Web API using NodeJS and hosted on multiple load balanced API instances.
- Implementation of enterprise application with jQuery, angularJS, node.js and SpringMVC.
- Used Spring Beans to encapsulate business logic and Implemented Application MVC Architecture using Spring MVC framework.
- Implemented Hibernate (ORM Mapping tool) framework to interact with the database to update, retrieve, insert and delete values effectively.
- Used Java Swing for few components in accordance with SWT application with multithreading environment with Concurrency and Java Collections.
- Used EH Cache for second level cache in Hibernate for the application.
- Involved in to pass messages like payload to track different statuses and milestones using EJB, JMS.
- Involved in unit testing, integration testing, SOAP UI testing, smoketesting, system testing and user acceptance testing of the application.
- Used Spring programmatic transaction management for Java Persistence.
- Involved in integration of Spring and Hibernate frameworks.
- Involved in setting server properties, DSs, JNDI, queues & deploying app in WebSphere Application Server.
- Followed the test-driven development using the JUNIT and Mockito framework.
- Created continuous integration builds using Maven.
- Involved in fixing QA/UAT/Production issues and tracked them using QC.