- Over 8+ years of extensive Professional IT experience, including 5+ years of Hadoop experience, capable of processing large sets of structured, semi - structured and unstructured data and supporting systems application architecture.
- Well experienced in the Hadoop ecosystem components like Hadoop, MapReduce, Cloudera, Horton works, Mahout, HBase, Oozie, Hive, Sqoop, Pig, and Flume.
- Experience in using Automation tools like Chef for installing, configuring and maintaining Hadoop clusters.
- Lead innovation by exploring, investigating, recommending, benchmarking and implementing data centric technologies for the platform.
- Technical leadership role responsible for developing and maintaining data warehouse and Big Data roadmap ensuring Data Architecture aligns to business centric road map and analytics capabilities.
- Experienced in Hadoop Architect and Technical Lead role, provide design solutions and Hadoop architectural direction
- 4+ years of industrial experience in Data manipulation, Big Data analytics using Hadoop Eco system tools Map-Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, HBase, Spark, Kafka, Flume, Sqoop, Flume, Oozie, Avro, AWS, Cassandra, Avro, Solr and Zookeeper.
- Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Mongo DB 3.0.1, HBase, Cassandra and DynamoDB (AWS).
- Extensively worked on Spark using Scala on cluster for computational (analytics), installed on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle.
- Excellent Programming skills at a higher level of abstraction using Scala, Java and Python.
- Hands on experience in developing SPARK applications using Spark API's like Spark core, Spark MLlib, Spark Streaming and Spark SQL.
- Strong experience and knowledge of real time data analytics using Spark, Kafka and Flume.
- Working knowledge of Amazon's Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
- Running of Apache Hadoop, CDH and Map-R distros, dubbed Elastic MapReduce(EMR) on (EC2).
- Expertise in developing Pig Latin scripts and using Hive Query Language.
- Developed Customized UDFs and UDAF's in java to extend HIVE and Pig core functionality.
- Created Hive tables to store structured data into HDFS and processed it using HiveQL.
- Worked on GUI Based Hive Interaction tools like Hue, Karma sphere for querying the data.
- Experience in validating and cleansing the data using Pig statements and hands-on experience in developing Pig MACROS.
- Designed ETL workflows on Tableau, Deployed data from various sources to HDFS and generated reports using Tableau.
- Done Clustering, regression and Classification using Machine learning libraries Mahout, MLlib(Spark).
- Good experience with use-case development, with Software methodologies like Agile and Waterfall.
- Working knowledge in installing and maintaining Cassandra by configuring the Cassandra. yaml file as per the business requirement and performed reads/writes using Java JDBC connectivity.
- Experience in OLTP and OLAP design, development, testing, implementation and support of enterprise Data warehouses.
- Written multiple MapReduce Jobs using Java API, Pig and Hive for data extraction, transformation and aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV, ORCFILE.
- Good knowledge on build tools like Maven, Graddle and Ant.
- Hands on experience in using various Hadoop distros (Cloudera (CDH 4/CDH 5), Hortonworks, Map-R, IBM Big Insights, Apache and Amazon EMR Hadoop distributions.
- Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MapReduce Programming Paradigm, High Availability and YARN architecture.
- Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews.
- Hands-on knowledge in core Java concepts like Exceptions, Collections, Data-structures, I/O. Multi-threading, Serialization and deserialization of streaming applications.
Big Data Technologies: HDFS, Hive, Hana, AWS, Map Reduce, Pig, Sqoop, Kafka, Storm, Oozie, Zookeeper, YARN, Avro, EMR, Spark.
Scripting Languages: Shell, Python, Perl, Scala
Tools: Quality center v11.0\ALM, TOAD, JIRA, HP QTP, HP UFT, Selenium, Test NG, Junit
Programming Languages: Java, C.., C, SQL, PL/SQL, PIG-Latin, HQL,CQL
Waterfall, Agile, V: model.
Java Frameworks: MVC, jQuery, Apache Struts2.0, spring and Hibernate
Defect Management: Jira, Quality Center.
Domain Knowledge: GSM, WAP, GPRS, CDMA and UMTS (3G)
Web Services: SOAP (JAX-WS), WSDL, SOA, Restful (JAX-RS), JMS
Application Servers: Apache Tomcat, Web Logic Server, Web Sphere, JBoss
Version controls: GIT, SVN, CVS
Databases: Oracle 11g, MySQL, MS SQL Server, IBM DB2 NoSQL Databases HBase, MongoDB, Cassandra Data Stax Enterprise 4.6.1
Cassandra RDBMS: Oracle 9i, Oracle 10g, MS Access, MS SQL Server, IBM DB2, and PL/SQL.
Operating Systems: Linux, UNIX, MAC, Windows NT / 98 /2000/ XP / Vista, Windows 7, Windows
Hadoop/ Bigdata Developer
Confidential - San Ramon, CA
- Worked on Hadoop cluster which ranged from 4-8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production
- Built APIs that will allow customer service representatives to access the data and answer queries.
- Designed changes to transform current Hadoop jobs to HBase.
- Analyzed HBase data in Hive by creating external partitioned and bucketed tables.
- Analyzed large and critical datasets using Cloudera, HDFS, HBase, MapReduce, Hive, Hive UDF, Pig, Sqoop, Zookeeper and Spark.
- Handled fixing of defects efficiently and worked with the QA and BA team for clarifications.
- Responsible for Cluster maintenance, Monitoring, commissioning and decommissioning Data nodes, Troubleshooting, Manage and review data backups, Manage & review log files.
- Extending the functionality of Hive and Pig with custom UDF s and UDAF's.
- Developed Spark Application by using Scala.
- The new Business Data Warehouse (BDW) improved query/report performance, reduced the time needed to develop reports and established self-service reporting model in Cognos for business users.
- Implemented Bucketing and Partitioning using Hive to assist the users with data analysis.
- Used Oozie scripts for deployment of the application and perforce as the secure versioning software.
- Used PIG to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into MongoDB.
- Used Kafka to patch up a customer activity taking after pipeline as a course of action of steady appropriate subscribe supports.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Implemented Cassandra connection with the Resilient Distributed Datasets.
- Extracted large volumes of data feed on different data sources, performed transformations and loaded the data into various Targets.
- Develop database management systems for easy access, storage, and retrieval of data.
- Perform DB activities such as indexing, performance tuning, and backup and restore.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components
- Expertise in writing Hadoop Jobs for analyzing data using Hive QL (Queries), Pig Latin (Data flow language), and custom MapReduce programs in Java.
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in the hive and Map Side joins.
- Expert in creating PIG and Hive UDFs using Java to analyze the data efficiently.
- Responsible for loading the data from BDW Oracle database, Teradata into HDFS using Sqoop.
- Implemented AJAX, JSON, and Java script to create interactive web screens.
- Wrote data ingestion systems to pull data from traditional RDBMS platforms such as Oracle and Teradata and store it in NoSQL databases such as MongoDB.
- Involved in creating Hive tables, then applied HiveQL on those tables, this will invoke and run MapReduce jobs automatically.
- Support of applications running on Linux machines
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Participated in requirement gathering from the Experts and Business Partners and converting the requirements into technical specifications.
- Used Zookeeper to manage coordination among the clusters.
- Experienced in analyzing Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suits the current requirements.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
- Assisted application teams in installing Hadoop updates, operating system, patches and version upgrades when required
- Assisted in Cluster maintenance, Cluster Monitoring, and Troubleshooting, Manage and review data backups and log files.
Environment: Apache Hadoop, Pig 0.11, Hive, HBase, Sqoop, Flume, MapReduce, JSP, Structs2.0, NoSQL, HDFS, Teradata, Sqoop, LINUX, Oozie, Zookeeper, Cassandra, Hue, Spark, Strom, HCatalog, Java. IBM Cognos, MongoDB, Oracle 11g/10g, Microsoft SQL Server, Microsoft SSIS, DB2 LUW, TOAD for DB2, IBM Data Studio, AIX 6.1, UNIX Scripting.
Confidential - San Jose, CA
- Processed data into HDFS, analyzed the data using MapReduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
- Expertise in Unix/Linux Shell Scripting.
- Developed MapReduce jobs to automate transfer of data from HBase.
- Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
- Read from Flume and involved in pushing batches of data to HDFS and HBase for real time processing of the files.
- Responsible for writing Hive Queries for analyzing data in Hive warehouse using HQL.
- Developing Hive User Defined Functions in java, compiling them into jars and adding them to the HDFS and executing them with Hive Queries.
- Developed optimal strategies for distributing the web log data over the cluster, importing and exporting the stored web log data into HDFS and Hive using Kafka connectors.
- Experienced in managing and reviewing Hadoop log files.
- Importing data from SQL to HDFS & Hive for analytical purpose.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD's, and Scala Python.
- Implemented working with different sources using Multi Input formats using Generic and Object Writable. Cluster co-ordination services through Zookeeper.
- Worked extensively with Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
- Extensive experience working with Business Intelligence Data Visualization Tools with specialization on Tableau.
- Implemented the workflows using Apache Oozie framework to automate tasks.
- Developed scripts and automated data management from end to end and sync up between all the clusters.
- Involved in the POC implementation of migrating map reduce programs into spark transformations using Spark and Scala.
- Continuous coordination with QA team, production support team and deployment team.
- Responsible for all kinds of data testing activities related to Hadoop before Deployment.
- Tested the XMLs feeds received from another source which is a third party for data consistency.
- Tested the ETL with XML as source and tables in the data warehouse as target. Implemented test scripts to support test driven development and continuous integration.
- Migrated an existing on-premises application to AWS.
- Used AWS services like EC2 and S3 for small data.
- Used cloud watch logs to move application logs to S3 and create alarms based on a few exceptions raised by applications.
Environment: Hadoop, HDFS, MapReduce, Unix Shell Scripting, Linux Shell Scripting, Zookeeper, Hive, HQL, Oozie, HBase, Flume, Tableau, Pentaho, Java, Cloudera, MySQL, SQL, Spark, Strom, Scala, AWS, EC2, S3.
- Worked extensively on importing data using Sqoop and flume.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Responsible for creating complex tables using hive and developing Hive queries for the analysts.
- Created partitioned tables in Hive for best performance and faster querying.
- Transportation of data to HBase using pig.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Experience with professional software engineering practices and best practices for the full software development life cycle including coding standards, code reviews, source control management and build processes.
- Involved in source system analysis, data analysis, data modeling to ETL, migrated corporate Linux servers from physical servers to Amazon AWS virtual servers.
- Written multiple MapReduce procedures to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
- Used Zookeeper to co-ordinate cluster services. Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Handling structured and unstructured data and applying ETL processes.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS
- Developed the Pig UDF'S to pre-process the data for analysis.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files
Environment: Hadoop, Apache, AWS Ec2, Sqoop, Pig, Hive, HBase, Oozie, MapReduce, Zookeeper, Java (jdk1.6), Flat files, AWS, Oracle 11g/10g, MySQL, Windows NT, UNIX, Zoo Keeper, Cloudera, Flume, CentOS, Maven.
- Identified data source systems integration issues and proposing feasible integration solutions.
- Partnered with Business Users and DW Designers to understand the processes of Development Methodology, and then implement the ideas in Development accordingly.
- Worked with Data modeler in developing STAR Schemas and Snowflake schemas.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Created Oracle PL/SQL queries and Stored Procedures, Packages, Triggers, Cursors and backup-recovery for the various tables.
- Supported MapReduce Programs those are running on the cluster.
- Identifying and tracking the slowly changing dimensions (SCD).
- Imported several transactional logs from web servers with Flume to ingest the data into HDFS. Using Flume and Spool directory for loading the data from local system (LFS) to HDFS.
- Extracting data from Oracle and Flat file, Excel files sources and performed complex joiner, Expression, Aggregate, Lookup, Stored procedure, Filter, Router transformations and Update strategy transformations to extract and load data into the target systems.
- Developed Hive scripts for implementing dynamic partitions
- Loaded data from UNIX file system to HDFS and written Hive User Defined Functions
- Developed code to pre-process large sets of various types of file formats such as Text, Avro, Sequence files, Xml, JSON and Parquet
- Created multi-stage Map Reduce jobs in Java for ad-hoc purposes.
- Scheduled the workflows at specified frequency according to the business requirements and monitored the workflows using Workflow Monitor.
Environment: PL/SQL, HDFS, Teradata 14.1, JSON, HADOOP (HDFS), MapReduce, PIG, HBase, Spark, R Studio, MAHOUT, JAVA, HIVE, AWS.
- Worked with business analyst in understanding business requirements, design and development of the project.
- Implemented the JSP frame work with MVC architecture.
- Created new JSP's for the front end using HTML, Java Script, Jquery, and Ajax.
- Involved in creating Restful web services using JAX RS and JERSEY tool.
- Involved in designing, creating, reviewing Technical Design Documents.
- Developed DAOs (Data Access Object) using Hibernate as ORM to interact with DBMS - Oracle.
- Applied J2EE design patterns like Business Delegate, DAO and Singleton.
- Involved in developing DAO's using JDBC.
- Worked with QA team in preparation and review of test cases.
- JUnit was used for unit testing for the integration testing tool.
- Writing SQL queries to fetch the business data using Oracle as database.
- Developed UI for Customer Service Modules and Reports using JSF, JSP's and My Faces Components.
- Log4j used for logging the application log of the running system to trace the errors and certain automated routine functions.