We provide IT Staff Augmentation Services!

Bigdata Engineer Resume

5.00/5 (Submit Your Rating)

NJ

SUMMARY:

  • 8+ years of IT industry experience encompassing wide range of skill set.
  • 4 years of experience in working with Big Data Technologies on system which comprises of several applications, highly distributive, massive amount of data using Cloudera, MapR and Confidential BigInsights Hadoop distributions.
  • Strong knowledge on Hadoop eco systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
  • Excellent knowledge on Hadoop architecture; as in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
  • Good understanding of data replication, HDFS Concepts, High Availability, Reading/Writing data onto HDFS, data flow etc in HDFS.
  • Good knowledge of setting up Hadoop clusters in different distributions.
  • Experience on Administering and Monitoring of Hadoop Cluster like commissioning and decommissioning of nodes, file system check, Cluster maintenance, upgrades etc.
  • Experience in designing the multi node Hadoop cluster with master and slave nodes.
  • Experience on Cloudera, MapR and also Confidential distribution.
  • Set configuration, install license, and join nodes to the cluster.
  • Installation Execution and Validate report with Data Migration.
  • Good understanding of Hadoop YARN which is Hadoop cluster resource management system and more popular these days.
  • Good Experience on importing and exporting the data from HDFS and Hive into Relational Database Systems like MySql and vice versa using Sqoop.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper and experience in setting up of Zookeeper on Hadoop Cluster.
  • Experience on running Oozie jobs daily, weekly or bi - monthly as needed for the business which will run in MapReduce way.
  • Experience on ETL and data visualization tool Pentaho data Integration, created jobs and transformations which makes analysis and some operations easier.
  • Good knowledge on NoSql Databases including HBase, MongoDB, MapR-DB.
  • Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
  • Involved in maintaining and analyzing large data sets of memory in Petabytes efficiently.
  • Successful in running Spark on YARN cluster mode which can make performance faster.
  • Installation and configuration of Pentaho Data Integration in different environments.
  • Experience on deployment of Apache Tez on top of YARN.
  • Executed complex HiveQL queries for required data extraction from Hive tables which are created from HBase.
  • Monitoring Map Reduce jobs and YARN applications.
  • Good knowledge on Apache Solr which is used as search engine in different distributions.
  • Extensive experience on Object Oriented Analysis and Design, JAVA/J2EE technologies, Web services.
  • Extensive experience in working with Oracle, MS SQL Server, DB2, MySQL and writing complex queries, Views, Triggers etc for different data models.
  • Experienced in SDLC, Agile Methodology.
  • Ability to meet deadlines without comprising in delivering right output.
  • Possess Strong Communication skills, Analytical skills.

TECHNICAL SKILLS:

Big Data Technologies: HDFS, Hadoop, Hive, Pig, Oozie, Zookeeper, Impala, Sqoop,MapReduce, Tez, Spark, Flume, HBase, MongoDB, Kafka, YARN

Distributions: Cloudera, MapR, IBMBigInsights, Hortonworks

Languages: JAVA, SQL, Pig Latin, SCALA, Hive-QL, Python, Shell Scripting

Database: NoSQL (HBase, MapR-DB, MongoDB), Oracle, MySQL, DB2, MS SQL Server, MS Access

BI Tools: Tableau, Pentaho, Talend

Software, Platforms &Tool: Eclipse, Putty, Cygwin, PentahoDI, Hue, JIRA

PROFESSIONAL EXPERIENCE:

Confidential,NJ

Bigdata Engineer

Responsibilities:
  • Day to Day activities, weekly status updates, Technical upgrade activities for various applications, process, and performance improvements.
  • Utilize the data engineering skills within and outside of the developing Chubb information ecosystem for discovery, analytics, and data management
  • Worked with architects, business partners and business analysts to understand requirements, design and build effective solutions.
  • Using Data wrangling techniques converting one "raw" form into another including data visualization, data aggregation, training a statistical model etc.
  • Created different levels of abstractions of data depending on analytics needs.
  • Hands on data preparation activities using the Hadoop technologies.
  • Implement discovery solutions for high speed data ingestion.
  • Worked with the Data leadership team to perform complex analytics and data preparation tasks.
  • Work with various relational and non-relational data sources with the target being Hadoop based repositories.
  • Sourcing data from multiple applications, profiling, cleansing, and conforming to create master data sets for analytics use.
  • Design solutions for managing highly complex business rules within the Hadoop ecosystem.
  • Performance tune data loads.
  • Being flexible to take up additional responsibility when there is a need.
  • Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance.
  • Involved in Hadoop cluster administration and successful in maintenance of large volumes of storage.
  • Developed the external tables in Hive which can be used for obtaining required data for analysis by writing queries.
  • Experience in administering the cluster, commissioning and decommissioning of data nodes, backup and recovery, File System Management, cluster performance and maintaining the healthy cluster in MapR distribution which uses MCS for cluster monitoring.
  • Experience in writing the programming with Scala and Python.
  • Knowledge on Akka framework with Scala.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Cassandra and SQL.
  • Analyzed the Cassandra/SQL scripts and designed the solution to implement using Scala.
  • Used Storm for Click Stream analysis which is very useful for online customer experience and started using Talend, in this project for this purpose.
  • Migrated the data from traditional database to NOSQL, MongoDB to analyze the influx of data using Hadoop ecosystem tools to optimize business processes.
  • Experience in managing and reviewing Hadoop log files.
  • Experience working with Sqoop to transfer data between the MapR-FS to relational database like MySQL and vice versa and used Talend for Sqoop.
  • Used Apache Spark on YARN to have fast large-scale data processing and to increase performance.
  • Involved in writing MapReduce jobs.
  • Implemented Zookeeper for the cluster to have the concurrent access.
  • Experience in writing MapReduce jobs and streaming jobs.
  • Experience in troubleshooting the issues and failed jobs in the Hadoop Cluster.
  • Able to tackle the problems and accomplished the tasks which should be done during the sprint.

Environment:: MapR-FS, MCS, Hue, MapReduce, Hive, Pig, Sqoop, Kafka, Storm, Spark, Scala, YARN, Zookeeper, Oozie, HBase, MapR-DB, Pentaho DI, Maven, Linux, Unix, Talend, MongoDB.

Confidential,Dallas,TX

Big Data Developer

Responsibilities:
  • Day to Day activities, weekly status updates, Technical upgrade activities for various applications, process and performance improvements.
  • Being flexible to take up additional responsibility when there is a need.
  • Involved in development of Hadoop System and improving multi-node Hadoop Cluster performance.
  • Involved in Hadoop cluster administration and successful in maintenance of large volumes of storage.
  • Involved in running the Oozie jobs daily, weekly, bi-monthly as required to know about the MapR-FS storage and for capacity planning.
  • Developed the external tables in Hive which can be used for obtaining required data for analysis by writing queries.
  • Created the tables in Hive and write data in using Talend hive components.
  • Experience in writing the programming with Scala and Python.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Cassandra and SQL.
  • Analyzed the Cassandra/SQL scripts and designed the solution to implement using Scala.
  • Migrated the data from traditional database to NOSQL, MongoDB to analyze the influx of data using Hadoop ecosystem tools to optimize business processes.
  • Experience in managing and reviewing Hadoop log files.
  • Experience working with Sqoop to transfer data between the MapR-FS to relational database like MySQL and vice versa and used Talend for Sqoop.
  • Used Apache Spark on YARN to have fast large-scale data processing and to increase performance.
  • Involved in writing MapReduce jobs.
  • Experience on Drill which can deliver secure, interactive SQL analytics at petabyte scale, most popular SQL engine for big data.
  • Implemented Zookeeper for the cluster to have the concurrent access.
  • Experience in writing MapReduce jobs and streaming jobs.
  • Experience in troubleshooting the issues and failed jobs in the Hadoop Cluster.
  • Able to tackle the problems and accomplished the tasks which should be done during the sprint.

Environment:: MapR-FS, M4&5, MCS, Hue, MapReduce, Hive, Pig, Sqoop, Kafka, Spark, Scala, Python, YARN, Zookeeper, Oozie, HBase, MapR-DB, Pentaho DI, Maven, Linux, Unix, MongoDB.

Confidential,Jacksonville,FL

Hadoop Developer

Responsibilities:
  • Worked on analyzing Hadoop cluster using different big data analytic tools including Hive, and MapReduce.
  • Involved in increasing the performance of system by adding other real time components like Flume, Spark to the platform.
  • Installed and configured Spark, Flume, Zookeeper, Ganglia and Nagios on the Hadoop cluster.
  • Hands on experience on Implementing Spark with Scala.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Developed Map Reduce Programs for data analysis and data cleaning.
  • Working with Apache Crunch library to write, test and run MapReduce pipeline jobs.
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Continuous monitoring and provisioning of Hadoop cluster through Cloudera Manager.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Worked on Impala for obtaining fast results without any transformation of data.
  • Worked on Kafka and Storm to ingest the real time data streams, to push the data to appropriate HDFS or HBase.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Used Tableau for visualizing and analyzing the data.
  • Experience on using Solr search engine which can be used for indexing and searching the data.

Environment: Hadoop, MapReduce, Hive, HDFS, Pig, CDH4.0, Sqoop, Kafka, Storm, Oozie, HBase, Spark, Scala, Cloudera Manager, Crunch, Tableau, Linux, Unix.

Confidential,Dallas,TX

Big Data Software Developer

Responsibilities:
  • Involved in designing the architecture in Hadoop.
  • Responsible for administering Hadoop system which include commissioning and decommissioning data nodes, cluster performance, maintaining cluster health, monitoring the system in web console etc. in Confidential BigInsights distribution.
  • Worked on with importing and exporting data from different Relational Database Systems likeDB2 into HDFS and Hive and vice-versa, using Sqoop.
  • Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
  • Developed automatic job flows and ran through Oozie daily and when needed which runs MapReduce jobs internally.
  • Written HiveQL queries on the hive table which are external tables created from HBase and generated reports from the data which are very useful for analysis.
  • Developed Pig Scripts, which is used as ETL tool to do transformations, aggregation of data before loading data into HDFS.
  • Experience on Storm and Kafka to get steam of data.
  • Worked on Apache Solr which is used as indexing and search engine.
  • Developed unit test cases using MR unit on MapReduce code.
  • Experience on Big-SQL which is interactive SQL engine with low latency and which is very useful for business.

Environment:: Hadoop, MapReduce, Hive, HDFS, Pig, Confidential BigInsights V2.0, Sqoop, Kafka, Storm, Lucene Oozie, HBase, Big SQL, JAVA and Red Hat Enterprise Linux, Unix.

Confidential

Java Developer

Responsibilities:
  • Developed web components using JSP, Servlets and JDBC.
  • Analyzing the use-cases to understand the business requirements and to assess the technical implementation of the functionality.
  • Used Java Mail API extensively to send the automated emails whenever ticket status or workflow steps got changed.
  • Designed, Implemented, Tested and Deployed Enterprise Java Beans both Session and Entity using WebLogic as Application Server.
  • Used tools like TOAD for SQL operations on Oracle Database.
  • Development of database interaction code to JDBC API making extensive use of SQL.
  • Query Statements and advanced Prepared Statements.
  • Used connection pooling for best optimization using JDBC interface.
  • Used EJB entity and session beans to implement business logic and session handling and transactions. Developed user-interface using JSP, Servlets, and JavaScript.
  • Wrote complex SQL queries and stored procedures.
  • Used JavaScript for Client side validation.

Environment:: JSPs, Servlets, Java Beans, UML, JDK 1.5, Oracle, TOAD, Java Script, HTML and CSS.

Confidential

Java Developer

Responsibilities:
  • Developed user interfaces templates using SPRING MVC, JSP.
  • Involved in development of form validations using simple form controller.
  • Responsible for implementation of controllers like simple form controller
  • Implementing design patters DAO, Singleton, Business delegate, strategy design pattern.
  • Used Spring 2.0 frame work to implement SPRING MVC Design pattern.
  • Designed, developed and deployed the J2EE components on Tomcat.
  • Used tools like Hibernate for OR-Mapping on Oracle database.
  • Involved in Transaction management and AOP using Spring.

Environment:: JAVA/J2EE, JSP, Spring 2.0 framework, Oracle, Hibernate.

We'd love your feedback!