- Almost 4.5+Years of IT experience in complete life cycle of software development using Object Oriented analysis and design using big data Technologies / Hadoop ecosystem, SQL, Java, J2EE technologies.
- Almost 4.5+ years of strong industry experience in Designing and Development, implementing and testing of various client/ server, web - based, distributed application.
- Last 3+ years working on Big Data and Data Science building Advanced Customer Insight and Product Analytic Platforms using Big Data and Open Source Technologies.
- Wide experience on Data Mining, Real time Analytics, Business Intelligence, Machine Learning and Web Development.
- Experienced in working with Hadoop/Big-Data storage and analytical frameworks over Amazon AWS cloud using tools like SSH, Putty and Mind-Term
- Experienced in collecting metrics for Hadoop clusters using Ambari & Cloudera Manager.
- Experienced on YARN environment with Storm, Spark, Kafka and Avro.
- Experienced with the Scala, Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
- Leveraged strong Skills in developing applications involving Big Data technologies like Hadoop, Spark, Elastic Search, Map Reduce, Yarn, Flume, Hive, Pig, Kafka, Storm, Sqoop, HBase, Hortonworks, Cloudera, Mahout, Avro and Scala.
- Skilled programming in Map-Reduce framework and Hadoop ecosystems.
- Very good experience in designing and implementing MapReduce jobs to support distributed data processing and process large data sets utilizing the Hadoop cluster.
- Experience in implementing Inverted Indexing algorithm using MapReduce.
- Extensive experience in creating Hive tables, loading them with data and writing hive queries which will run internally in MapReduce way.
- Hands on experience in migrating complex MapReduce programs into Apache Spark RDD transformations.
- Experience in setting up standards and processes for Hadoop based application design and implementation.
- Implemented Large scale Hadoop (Hortonworks HDP 2.4 Stack) enterprise Data lake and HDF Nifi cluster for SIT, DEV, UAT, CERT and PROD Environment.
- Upgrade Hortonworks Ambari and HDP Stack from 2.3 to 2.4 Version in Dev, DR and Prod Environment.
- Responsible for provisioning and managing Hadoop clusters on public cloud environment Amazon Web Services (AWS) -EC2 for Product POCs.
- Diligently teaming with the infrastructure, network, database, application and Platform teams to guarantee high data quality and availability.
- Configured the Chef and Ansible for Hadoop package deployments and other configuration push
- Good Exposure on Apache Hadoop MapReduce programming, PIG Scripting and HDFS4.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using FLUME and SQOOP, perform structural modifications using Map-Reduce, HIVE and analyse data using visualization/reporting tools.
- Experience in writing Pig UDF’s (Eval, Filter, Load and Store) and macros.
- Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality.
- Exposure on usage of Apache Kafka develop data pipeline of logs as a stream of messages using producers and consumers.
- Experience in integrating Apache Kafka with Apache Storm and created Storm data pipelines for real time processing.
- Very good understanding on NOSQL databases like MongoDB, Cassandra and HBase.
- Experience in coordinating Cluster services through Zookeeper.
- Hands on experience in setting up Apache Hadoop Cloudera, MapR and Hortonworks Clusters.
- Good knowledge on Apache Hadoop Cluster planning which includes choosing the Hardware and operating systems to host an Apache Hadoop cluster.
- Experience in Hadoop Distributions like Cloudera, Horton Works, MapR Windows Azure, and Impala.
- Experience using integrated development environment like Eclipse, Net beans, JDeveloper, My Eclipse.
- Excellent understanding of relational databases as pertains to application development using several RDBMS including in Oracle 10g, MS SQL Server 2005/2008, and MySQL and strong database skills including SQL, Stored Procedure and PL/SQL.
- Working knowledge on J2EE development with Spring, Struts, Hibernate Frameworks in various projects and expertise in Web Services (JAXB, SOAP, WSDL, Restful) development
- Experience in writing tests using Spec2, Scala Test, Selenium, TestNG and Junit.
- Ability to work on diverse Application Servers like JBOSS, APACHE TOMCAT, WEBSPHERE.
- Worked on different OS like UNIX/Linux, Windows XP, and Windows
- A passion to learn new things (new Languages or new Implementations) have made me up to date with the latest trends and industry standard.
- Proficient in adapting to the new Work Environment and Technologies.
- Quick learner and self-motivated team player with excellent interpersonal skills.
- Well focused and can meet the expected deadlines on target.
- Good understanding of agile methodologies, Test Driven Development and continuous integration.
Languages: Java, Python, C#, R Programming, Machine Learning, NLP, NLTK, SQLPLSQL, XML, C++, HTML, XML, CSS, Java Script.
Java Technologies: Java, J2EE, JDBC, Servlets, JSP, JavaBeans
Big Data Technology: HDFS, Map Reduce, Pig, Hive, Hbase, Zookeeper, MongoDB, Flume, OozieSqoop, Avro, Kafka, Apache Spark, Ambari, Ganglia, Kerberos, Tivoli.
Frame Works: Struts, Hibernate and Spring.
Development Tools: Eclipse, My Eclipse, Tomcat, Web Logic.
Databases: Oracle, MySQL, Hbase, MongoDB, Cassandra, MS SQL Server, Teradata.
Scripting languages: Java Script, Python, and Linux Shell Scripts.
UNIX, Red Hat Linux (Cent OS, Fedora, RHEL), Ubuntu, Windows
Methodologies: Agile, waterfall.
Management Tool: SVN, CVS. GitHub.
Cloud Technology: GCP, Azure, AWS.
Indexing Tool: Apache Solr, Lucene, Elastic Search
Hadoop Enterprise Distribution: CDH, HDP, MapR
Sr. Technology Analyst (Full Stack Hadoop AWS Consultant)
- Hands on experience in Amazon Web Services (AWS) provisioning and good knowledge of AWS services like EC2, S3, Glacier, ELB (Load Balancers), RDS, SNS, SWF, and EBS etc.
- Participate in Multiple Projects Architectures and Strategic Decisions meetings to Architect the new Software systems or modify the Existing ones.
- Responsible for creating and executing a data migration plan and performing the Migration, Including creation of a schedule and timeline to complete the Migration. Designing and Implementing AWS cloud based solutions for on premise applications.
- Participated in various discussions with senior management and proposing new feature and changing the architectural designs as per the requirement.
- Implementing a Continuous Delivery framework using Jenkins, CHEF, and Maven in Linux environment.
- Created Python Scripts to Automate AWS services which include web servers, ELB, Cloud front Distribution, database, EC2 and database security groups, S3 bucket and application configuration, this Script creates stacks, single servers or joins web servers to stacks.
- Created scripts in Python which Integrated with Amazon API to control instance operations.
- Handling imported data in HDFS & hive using Hive QL and custom Map Reduce programs in Java. Importing data from oracle to HDFS & Hive for analytical purpose.
- Used SQOOP tool to load data from RDBMS into HDFS.
- Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
- Managed Amazon Web Services like EC2, bit bucket, RDS, EBS, ELB, Auto scaling, AMI, IAM through AWS console and API Integration with Puppet Code.
- Designed and worked with team to implement ELK (elastic search, log stash and Kibana) Stack on AWS.
- Using CloudTrail, TESSA, CloudPassage, CheckMarx, Qualys Scan tools for AWS security and scanning.
- Automated the Applications and MYSQL, NOSQL container deployment in Docker using Python and monitoring of these containers using Nagios.
- Refined automation components with scripting and configuration management (ansible).
- Configuration Automation and Centralized Management with Ansible and Cobbler. Implemented Ansible to manage all existing servers and automate the build/configuration of new servers. All server's types where fully defined in Ansible, so that a newly built server could be up and ready for production within 30 minutes OS installation.
- Used various services of AWS for this infrastructure. I used EC2 as virtual servers to host Git, Jenkins and configuration management tool like Ansible. Converted slow and manual procedures to dynamic API generated procedures. Used to write some Ansible scripts.
- Wrote Ansible Playbooks with Python SSH as the Wrapper to Manage Configurations of AWS nodes and Tested Playbooks on AWS instances using Python. Run Ansible Scripts to Provide Dev Servers.
- Automated various infrastructure activities like Continuous Deployment, Application Server setup, Stack monitoring using Ansible playbooks and has Integrated Ansible with Rundeck and Jenkins.
- Implemented Nagios and integrated with Ansible for automatic monitoring of servers. Designs and implement Cobbler infrastructure and integrate with Ansible doing Linux provisioning.
- Solid involvement in extracting log files generated from sources like kafka cluster to HDFS using Flume.
- Hands on experience in importing other enterprise data from different data sources into HDFS using Sqoop and Flume, and also performing transformations using Hive and then loading into HBase tables with various compressions.
- Used Spark API over Hortonworks Hadoop Yarn to perform analytics on data in Hive.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Migrated complex Map reduce programs, Hive scripts into Spark RDD transformations and actions.
- Developed Scala scripts, UDF's using both SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into Rdbms through Sqoop.
- Responsible to Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Worked in writing Spark Sql scripts for optimizing the query performance.
- Writing UDF/Map reduce jobs depending on the specific requirement.
- Worked on loading source data to HDFS by writing java code.
- All small files will be merged and loaded into HDFS using java code and tracking history related to merge files are maintained in HBASE.
- Implemented Hive custom UDF’s to integrate the healthcare and pharmaceutical which produces business data to achieve comprehensive data analysis.
Sr. Technical Analyst (Hadoop Developer/Administrator)
- Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH5.5 Distribution.
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data
- Imported data using Sqoop from Tera data using Tera data connector. Integrated Quartz scheduler with Oozie work flows to get data from multiple data sources parallel using fork.
- Processed Multiple Data sources input to same Reducer using Generic Writable and Multi Input format. Created Data Pipeline of Map Reduce programs using Chained Mappers.
- Exported the patterns analysed back to Teradata using Sqoop. Visualize the HDFS data to customer using BI tool with the help of Hive ODBC Driver.
- Familiarity with a NoSQL database such as Cassandra.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Worked big data processing of financial and server log data using Map Reduce. Implemented complex map reduce programs to perform joins on the Map side using Distributed Cache in Java.
- Worked on implementing SPARK with SCALA. Responsible for importing log files from various sources into HDFS using Flume. Created customized BI tool for manager team that perform Query analytics using HiveQL. Used Hive and Pig to generate BI reports.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis. Created Partitions, Buckets based on State to further process using Bucket based Hive joins. Created Hive Generic UDF's to process business logic that varies based on policy.
- Moved Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables. Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution
- Worked on custom Pig Loaders and storage classes to work with variety of data formats such as JSON and XML file formats. Experienced with different kind of compression techniques like LZO, GZip, and Snappy.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Developed Unit test cases using Junit, Easy Mock and MR Unit testing frameworks. Experienced in Monitoring Cluster using Cloudera manager.
Environment: Hadoop, HDFS, HBase, Cassandra, Spark, MapReduce, Yarn, Zookeeper, Tera Data, Java, Python, Scala, Hive, Pig, Sqoop, Flume, Oozie, Hue, SQL, ETL, Cloudera Manager, MySQL, Oracle, Apache Tikka, Lucene, Apache Solr, MongoDB, Cassandra.
- Developed the application using Spring Framework that leverages classical Model View Layer (MVC) architecture UML diagrams like use cases, class diagrams, interaction diagrams, and activity diagrams were used.
- Participated in requirement gathering and converting the requirements into technical specifications
- Created Business Logic using Servlets, Session beans and deployed them on Web logic server.
- Developed the XML Schema and Web services for the data maintenance and structures
- Implemented the Web Service client for the login authentication, credit reports and applicant information using Web Service.
- Successfully integrated Hive tables and Cassandra DB collections and developed web service that queries Cassandra DB collection and gives required data to web UI.
- Developed workflows using custom MapReduce, Pig, Hive and Sqoop.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Developed a data pipeline and Storm to store data into HDFS.
- Maintain Hadoop, Hadoop ecosystems, third party software, and database(s) with updates/upgrades, performance tuning and monitoring
- Extracted feeds form social media sites such as Facebook, Twitter using Python scripts. Responsible in modification of API packages
- Managing and scheduling Jobs on a Hadoop cluster. Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Created UDFs to calculate the pending payment for the given Residential or Small Business customer, and used in Pig and Hive Scripts. Responsible to manage data coming from different sources. Developed Shell and Python scripts to automate and provide Control flow to Pig scripts.
- Got good experience with NOSQL database. Experience in managing and reviewing Hadoop log files. Used Hibernate ORM framework with spring framework for data persistence and transaction management.
- Involved in integrating Web Services using WSDL and UDDI. Built and deployed Java applications into multiple UNIX based environments and produced both unit and functional test results along with release notes