Big Data Engineer Resume
San Francisco, CA
SUMMARY:
- 8+ years of IT industry experience encompassing wide range of skill set.
- 4+ years of experience in working with Big Data Technologies on system which comprises of several applications, highly distributive, massive amount of data using Cloudera, MapR and IBM BigInsights Hadoop distributions.
- Strong knowledge on Hadoop eco systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
- Excellent knowledge on Hadoop architecture; as in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
- Good understanding of data replication, HDFS Concepts, High Availability, Reading/Writing data onto HDFS, data flow etc in HDFS.
- Good knowledge of setting up Hadoop clusters in different distributions.
- Experience on Administering and Monitoring of Hadoop Cluster like commissioning and decommissioning of nodes, file system check, Cluster maintenance, upgrades etc.
- Experience in designing the multi node Hadoop cluster with master and slave nodes.
- Experience on Cloudera, MapR and also IBM distribution.
- Good understanding of Hadoop YARN which is Hadoop cluster resource management system and more popular these days.
- Good Experience on importing and exporting the data from HDFS and Hive into Relational Database Systems like MySql and vice versa using Sqoop.
- Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper and experience in setting up of Zookeeper on Hadoop Cluster.
- Experience on running Oozie jobs daily, weekly or bi - monthly as needed for the business which will run in MapReduce way.
- Experience on ETL and data visualization tool Pentaho data Integration, created jobs and transformations which makes analysis and some operations easier.
- Good knowledge on NoSql Databases including HBase, MongoDB, MapR-DB.
- Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
- Installations of Nagios, Ganglia open source tools on different environments.
- Involved in maintaining and analyzing large data sets of memory in Petabytes efficiently.
- Successful in running Spark on YARN cluster mode which can make performance faster.
- Installation and configuration of Pentaho Data Integration in different environments.
- Experience on deployment of Apache Tez on top of YARN.
- Executed complex HiveQL queries for required data extraction from Hive tables which are created from HBase.
- Monitoring Map Reduce jobs and YARN applications.
- Good knowledge on Apache Solr which is used as search engine in different distributions.
- Extensive experience on Object Oriented Analysis and Design, JAVA/J2EE technologies, Web services.
- Extensive experience in working with Oracle, MS SQL Server, DB2, MySQL and writing complex queries, Views, Triggers etc for different data models.
- Experienced in SDLC, Agile Methodology.
- Ability to meet deadlines without comprising in delivering right output.
- Possess Strong Communication skills, Analytical skills.
TECHNICAL SKILLS:
Big Data Technologies: HDFS, Hadoop, Hive, Pig, Oozie, Zookeeper, Impala, SqoopMapReduce, Tez, Spark, Flume, HBase, MongoDB, Kafta, YARN
Distributions: Cloudera, MapR, IBMBigInsights, Hortonworks
Languages: JAVA, SQL, PigLatin, HiveQL, Shell Scripting
Database: NoSQL (HBase, MapR-DB, MongoDB), Oracle, MySQL, DB2, MS
SQL Server, MS Access
BI Tools: Tableau, Pentaho, Talend
Software, Platforms &Tool: Eclipse, Putty, Cygwin, PentahoDI, Hue, JIRA
PROFESSIONAL EXPERIENCE:
Confidential, San Francisco, CA
Big Data Engineer
Responsibilities:
- Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance.
- Involved in Hadoop cluster administration and successful in maintenance of large volumes of storage.
- Involved in running the Oozie jobs daily, weekly, bi-monthly as required to know about the MapR-FS storage and for capacity planning.
- Developed the external tables in Hive which can be used for obtaining required data for analysis by writing queries.
- Written queries in HiveQL to structure the data in a tabular format.
- Created the tables in Hive and write data in using Talend hive components.
- Experience in administering the cluster, commissioning and decommissioning of data nodes, backup and recovery, File System Management, cluster performance and maintaining the healthy cluster in MapR distribution which uses MCS for cluster monitoring.
- Used Storm for Click Stream analysis which is very useful for online customer experience and started using Talend, in this project for this purpose.
- Experience in managing and reviewing Hadoop log files.
- Experience working with Sqoop to transfer data between the MapR-FS to relational database like MySQL and vice versa and used Talend for Sqoop.
- Involved in installation of Nagios and Ganglia which is tool for provisioning and monitoring the Hadoop cluster and viewing the health of a cluster.
- Used Apache Spark on YARN to have fast large scale data processing and to increase performance.
- Created the jobs and transformations in Pentaho Data Integration, ETL tool which are useful in analyzing the customer behavioral analysis.
- Involved in writing MapReduce jobs.
- Experience on Drill which can deliver secure, interactive SQL analytics at petabyte scale, most popular SQL engine for big data.
- Installed Apache Tez, a programing framework which is built on YARN in increase performance.
- Implemented Zookeeper for the cluster to have the concurrent access.
- Experience in writing MapReduce jobs and streaming jobs.
- Experience in troubleshooting the issues and failed jobs in the Hadoop Cluster.
- Able to tackle the problems and accomplished the tasks which should be done during the sprint.
Environment: MapR-FS, M4&5, MCS, Hue, MapReduce, Hive, Pig, Sqoop, Kafka, Storm, Spark, YARN, Zookeeper, Oozie, HBase, MapR-DB, Pentaho DI, Maven, Linux, Talend.
Confidential, Jacksonville, FL
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Hive, and MapReduce.
- Involved in increasing the performance of system by adding other real time components like Flume, Spark to the platform.
- Installed and configured Spark, Flume, Zookeeper, Ganglia and Nagios on the Hadoop cluster.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Developed Map Reduce Programs for data analysis and data cleaning.
- Working with Apache Crunch library to write, test and run MapReduce pipeline jobs.
- Developed PIG Latin scripts for the analysis of semi structured data.
- Continuous monitoring and provisioning of Hadoop cluster through Cloudera Manager.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Worked on Impala for obtaining fast results without any transformation of data.
- Worked on Kafka and Storm to ingest the real time data streams, to push the data to appropriate HDFS or HBase.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Used Tableau for visualizing and analyzing the data.
- Experience on using Solr search engine which can be used for indexing and searching the data.
Environment: Hadoop, MapReduce, Hive, HDFS, Pig, CDH4.0, Sqoop, Kafka, Storm, Oozie, HBase, Cloudera Manager, Crunch, Tableau, Linux.
Confidential, Dallas, TX
Big Data Software Developer
Responsibilities:
- Involved in designing the architecture in Hadoop.
- Responsible for administering Hadoop system which include commissioning and decommissioning data nodes, cluster performance, maintaining cluster health, monitoring the system in web console etc. in IBM BigInsights distribution.
- Worked on with importing and exporting data from different Relational Database Systems likeDB2 into HDFS and Hive and vice-versa, using Sqoop.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally.
- Written HiveQL queries on the hive table which are external tables created from HBase and generated reports from the data which are very useful for analysis.
- Developed Pig Scripts, which is used as ETL tool to do transformations, aggregation of data before loading data into HDFS.
- Experience on Storm and Kafka to get steam of data.
- Worked on Apache Solr which is used as indexing and search engine.
- Developed unit test cases using MR unit on MapReduce code.
- Experience on Big-SQL which is interactive SQL engine with low latency and which is very useful for business.
Environment: Hadoop, MapReduce, Hive, HDFS, Pig, IBM BigInsights V2.0, Sqoop, Kafka, Storm, Lucene Oozie, HBase, Big SQL, JAVA and Red Hat Enterprise Linux.
Confidential, Milpitas, CA
Hadoop Developer
Responsibilities:
- Introduced and developed architecture for a data platform service based on Apache open Source Hadoop eco system with HDFS, Flume, Solr, Impala, Hive to ingest, store, index and analyze big data.
- Evaluated NoSQL data store solutions and delivered recommendations.
- Migrated the data from traditional database to NOSQL, MongoDB to analyze the influx of data using Hadoop ecosystem tools to optimize business processes
- Used Sqoop to import data into HDFS and Hive from other data systems.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Migration of ETL processes from RDBMS to Hive to test the easy data manipulation.
- Experience in writing and running MapReduce jobs on MongoDB data and return results back to MongoDB.
- Good understanding of choosing NoSql databases for Hadoop.
Environment: Hadoop, MapReduce, Hive, HDFS, Pig, CDH3.0, Zookeeper Sqoop, Oozie, MongoDB, Cloudera Manager, Linux.
Confidential
Java Developer
Responsibilities:
- Developed web components using JSP, Servlets and JDBC.
- Analyzing the use-cases to understand the business requirements and to assess the technical implementation of the functionality.
- Used Java Mail API extensively to send the automated emails whenever ticket status or workflow steps got changed.
- Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server.
- Used tools like TOAD for SQL operations on Oracle Database.
- Development of database interaction code to JDBC API making extensive use of SQL.
- Query Statements and advanced Prepared Statements.
- Used connection pooling for best optimization using JDBC interface.
- Used EJB entity and session beans to implement business logic and session handling and transactions. Developed user-interface using JSP, Servlets, and JavaScript.
- Wrote complex SQL queries and stored procedures.
- Used JavaScript for Client side validation.
Environment:: JSPs, Servlets, Java Beans, UML, JDK 1.5, Oracle, TOAD, Java Script, HTML and CSS.
Confidential
Java Developer
Responsibilities:
- Developed user interfaces templates using SPRING MVC, JSP.
- Involved in development of form validations using simple form controller.
- Responsible for implementation of controllers like simple form controller
- Implementing design patters DAO, Singleton, Business delegate, strategy design pattern.
- Used Spring 2.0 frame work to implement SPRING MVC Design pattern.
- Designed, developed and deployed the J2EE components on Tomcat.
- Used tools like Hibernate for OR-Mapping on Oracle database.
- Involved in Transaction management and AOP using Spring.
Environment: JAVA/J2EE, JSP, Spring 2.0 framework, Oracle, Hibernate.
