- Over 8+ years of extensive Professional IT experience, including 5+ years of Hadoop experience, capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- Well experienced in the Hadoop ecosystem components like Hadoop, MapReduce, Cloudera, Horton works, Mahout, HBase, Oozie, Hive, Sqoop, Pig, and Flume.
- Experience in using Automation tools like Chef for installing, configuring and maintaining Hadoop clusters.
- Lead innovation by exploring, investigating, recommending, benchmarking and implementing data centric technologies for the platform.
- Technical leadership role responsible for developing and maintaining data warehouse and Big Data roadmap ensuring Data Architecture aligns to business centric road map and analytics capabilities.
- Experienced in Hadoop Architect and Technical Lead role, provide design solutions and Hadoop architectural direction
- 4+ years of industrial experience in Data manipulation, Big Data analytics using Hadoop Eco system tools Map-Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, HBase, Spark, Kafka, Flume, Sqoop, Flume, Oozie, Avro, AWS, Cassandra, Avro, Solr and Zookeeper.
- Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Mongo DB 3.0.1, HBase, Cassandra and DynamoDB (AWS).
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
- Excellent Programming skills at a higher level of abstraction using Scala, Java and Python.
- Hands on experience in developing SPARK applications using Spark API's like Spark core, Spark MLlib, Spark Streaming and Spark SQL.
- Strong experience and knowledge of real time data analytics using Spark, Kafka and Flume.
- Working knowledge of Amazon's Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
- Running of Apache Hadoop, CDH and Map-R distros, dubbed Elastic MapReduce(EMR) on (EC2).
- Expertise in developing Pig Latin scripts and using Hive Query Language.
- Developed Customized UDFs and UDAF's in java to extend HIVE and Pig core functionality.
- Created Hive tables to store structured data into HDFS and processed it using HiveQL.
- Worked on GUI Based Hive Interaction tools like Hue, Karma sphere for querying the data.
- Experience in validating and cleansing the data using Pig statements and hands-on experience in developing Pig MACROS.
- Designed ETL workflows on Tableau, Deployed data from various sources to HDFS and generated reports using Tableau.
- Done Clustering, regression and Classification using Machine learning libraries Mahout, MLlib(Spark).
- Good experience with use-case development, with Software methodologies like Agile and Waterfall.
- Working knowledge in installing and maintaining Cassandra by configuring the Cassandra. yaml file as per the business requirement and performed reads/writes using Java JDBC connectivity.
- Experience in OLTP and OLAP design, development, testing, implementation and support of enterprise Data warehouses.
- Written multiple MapReduce Jobs using Java API, Pig and Hive for data extraction, transformation and aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV, ORCFILE.
- Good knowledge on build tools like Maven, Graddle and Ant.
- Hands on experience in using various Hadoop distros (Cloudera (CDH 4/CDH 5), Hortonworks, Map-R, IBM Big Insights, Apache and Amazon EMR Hadoop distributions.
- Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MapReduce Programming Paradigm, High Availability and YARN architecture.
- Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews.
- Hands-on knowledge in core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization of streaming applications.
APACHE (6 years), APACHE HADOOP HDFS (6 years), APACHE HADOOP SQOOP (6 years), APACHE HBASE (6 years), Hadoop (6 years), pache Hadoop 2.0.0, Pig 0.11, Hive 0.10, Sqoop 1.4.3, Flume, MapReduce, JSP, Structs2.0, NoSQL, HDFS, Teradata, Sqoop, LINUX, Oozie, Cassandra, Hue, HCatalog, Java. IBM Cognos, Oracle 11g/10g, Microsoft SQL Server, Microsoft SSIS, DB2 LUW, TOAD for DB2, IBM Data Studio, AIX 6.1, UNIX Scripting, (Less than 1 year)
Confidential, San Francisco, CA
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase database and SQOOP.
- Installed Hadoop, Map Reduce, HDFS, and Developed multiple maps reduce jobs in PIG and Hive for data cleaning and pre - processing.
- Coordinated with business customers to gather business requirements. And also interact with other technical peers to derive Technical requirements and delivered the BRD and TDD documents.
- Extensively involved in the Design phase and delivered Design documents.
- Involved in Testing and coordination with business in User testing.
- Importing and exporting data into HDFS and Hive using SQOOP.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing Hive queries that will run internally in map reduce way.
- Experienced in defining job flows.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Experienced in managing and reviewing the Hadoop log files.
- Used Pig as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Load and Transform large sets of structured data.
- Responsible to manage data coming from different sources.
- Involved in creating Hive Tables, loading data and writing Hive queries.
- Utilized Apache Hadoop environment by Cloudera.
- The created Data model for Hive tables.
- Involved in Unit testing and delivered Unit test plans and results in documents.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Worked on Oozie workflow engine for job scheduling.
Environment: Apache Hadoop 2.0.0, Pig 0.11, Hive 0.10, Sqoop 1.4.3, Flume, MapReduce, JSP, Structs2.0, NoSQL, HDFS, Teradata, Sqoop, LINUX, Oozie, Cassandra, Hue, HCatalog, Java. IBM Cognos, Oracle 11g/10g, Microsoft SQL Server, Microsoft SSIS, DB2 LUW, TOAD for DB2, IBM Data Studio, AIX 6.1, UNIX Scripting
Confidential, Stamford, CT
- Responsible for building scalable distributed data solutions using Hadoop.
- Understanding business needs, analyzing functional specifications and map those to develop.
- Involved in loading data from Mainframe DB2 into HDFS using Sqoop.
- Handled Delta processing or incremental updates using Hive.
- Responsible for daily ingestion of data from DATALAKE to CDB Hadoop tenant system.
- Developed PIG Latin scripts in transformations while extracting data from source system.
- To work on data issue related tickets and to provide the fix.
- To monitor and fix the production job failures.
- Review the team members design documents and coding.
- Documented the systems processes and procedures for future references including design and code reviews.
- Involved in story - driven agile development methodology and actively participated in daily scrum meetings.
- Implemented data ingestion from multiple sources like IBM Mainframes, Oracle.
- Developed transformations and aggregated the data for large data sets using Pig and Hive scripts.
- Worked on partitioning and used bucketing in HIVE tables and running the scripts in parallel to improve the performance.
- Have a thorough knowledge of spark architecture and how RDD's work internally.
- Have exposure to Spark SQL.
- Have experience in Scala programming language and used it extensively with Spark for data processing.
Environment: Hadoopv2/Yarn-2.4, Spark, AWS, MapReduce, Teradata 15.0, Hive, REST, Sqoop, Flume, Pig, Cloudera, Kafka, SSRS.
Confidential, Golden, CO
- Worked on analyzing Hadoop cluster using different big data analytic tools including Flume, Pig, Hive, Sqoop & Spark.
- Developed Spark code using Scala for faster processing of data.
- AGILE development methodology has been followed to develop the application.
- Installing and configuring a Hadoop Cluster on a different platform like Cloudera, Pivotal HD and AWS - EMR with other ecosystems like Sqoop, HBase, Hive, and Spark.
- Developed Spark SQL to load tables into HDFS to run select queries on top.
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Used Spark Streaming to divide streaming data into batches as an input to Spark engine for batch processing.
- Highly skilled in integrating Kafka with Spark streaming for high-speed data processing
- Used Spark Data frames, Spark-SQL, Spark MLLib extensively.
- Integrated Apache Storm with Kafka to perform web analytics.
- Uploaded click stream data from Kafka to Hdfs, HBase, and Hive by integrating with Storm
- Designed the ETL process and created the high-level design document including the logical data flows, source data extraction process, the database staging and the extract creation, source archival, job scheduling and Error Handling.
- Worked on Talend ETL tool and used features like context variable and database components like input to Oracle, output to Oracle, tFile compare, tFile copy, to Oracle close ETL components.
- Created ETL Mapping with Talend Integration Suite to pull data from Source, apply transformations, and load data into the target database.
- Developed the ETL mappings using Maplets and reusable transformations, and various transformations such as source qualifier, expression, connected and unconnected look up, router, aggregator, filter, sequence generator, update strategy, normalizer, joiner and rank transformations in Power Center Designer.
- Created, altered and deleted topics (Kafka Queues) when required with varying
- Performance tuning using Partitioning, bucketing of IMPALA tables.
- Experience in NoSQL database such as HBase, MongoDB Involved in cluster maintenance and monitoring.
- Load and transform large sets of structured, semi structured and unstructured data
- Involved in loading data from UNIX file system to HDFS.
- Created an email notification service upon completion of a job or the particular team which requested the data.
- Worked on NOSQL databases which differ from classic relational databases.
- Conducted requirements gathering sessions with various stakeholders
- Involved in knowledge transition activities to the team members.
- Successful in creating and implementing complex code changes.
- Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
- Configuring EC2 instances in VPC network & managing security through IAM and Monitoring server's health through Cloud Watch
- Experience in S3, Cloud Front and Route 53.
Environment: HDFS, Hive, Pig, HBase, Unix Shell Script, Talend, Spark, Scala.
Confidential, Memphis, TN
- Worked on Hadoop cluster which ranged from 4 - 8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production
- Built APIs that will allow customer service representatives to access the data and answer queries.
- Designed changes to transform current Hadoop jobs to HBase.
- Handled fixing of defects efficiently and worked with the QA and BA team for clarifications.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Extending the functionality of Hive and Pig with custom UDF s and UDAF's.
- Developed Spark Application by using Scala.
- The new Business Data Warehouse (BDW) improved query/report performance, reduced the time needed to develop reports and established self-service reporting model in Cognos for business users.
- Implemented Bucketing and Partitioning using Hive to assist the users with data analysis.
- Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Extracted large volumes of data feed on different data sources, performed transformations and loaded the data into various Targets.
- Develop database management systems for easy access, storage, and retrieval of data.
- Perform DB activities such as indexing, performance tuning, and backup and restore.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components
- Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in the hive and Map Side joins.
- Expert in creating PIG and Hive UDFs using Java to analyze the data efficiently.
- Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop.
- Implemented AJAX, JSON, and Java script to create interactive web screens.
- Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB.
- Involved in creating Hive tables, then applied HiveQL on those tables, this will invoke and run MapReduce jobs automatically.
- Support of applications running on Linux machines
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
Environment: HTML5, SCSS, CSS3, Mix Panel, Mustache, Glyph icons, Bootstrap, AngularJS, Spring AOP, Hibernate, Promises, Bower, NPM, React.js, Redux, NET, AWS, RESTful, Nodejs.
- Involved in requirements collection & analysis from the business team.
- Created the design documents with use case diagram, class diagram, and the sequence diagrams using rational rose.
- Implemented the MVC architecture using Apache Struts framework.
- Implemented Action Classes and server - side validations for account activity, payment history, and transactions.
- Implemented views using struts tags, JSTL and Expression Language.
- Implemented session beans to handle the business logic for fund transfer, loan, credit card & fixed deposit modules.
- Worked with various Java patterns such as singleton and factory pattern at the business layer for effective objective behaviors.
- Worked on the Java collections API for handling the data objects between the business layers and the front end.
- Developed unit test cases using JUnit.
- Developed ant scripts and developed builds using Apache ANT.
- The used clear case for source code maintenance.