We provide IT Staff Augmentation Services!

Big Data Engineer Resume

2.00/5 (Submit Your Rating)

San Francisco, CA

SUMMARY:

  • 8 years of IT industry experience encompassing wide range of skill set.
  • 4+ years of experience in working with Big Data Technologies on system which comprises of several applications, highly distributive, massive amount of data using Cloudera, MapR and Confidential Big Insights Hadoop distributions.
  • Good understanding of AWS, Amazon EMR.
  • Good knowledge and Experience with workflow managers like Oozie, Azkaban, Luigi.
  • Extensive experience in Hadoop performing the duties of both Hadoop Developer and Hadoop Administrator in different projects.
  • Hands on experience with Hadoop ETLby different ETL tools.
  • Extensive experience on developing Spark Streaming jobs by developing RDD’s (Resilient Distributed Datasets) and used SparkSQL as required.
  • Excellent knowledge on Hadoop architecture; as in HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
  • Experience on developing JAVA MapReduce jobs for data cleaning and data manipulation as required for the business.
  • Good understanding of data replication, HDFS Concepts, High Availability, Reading/Writing data onto HDFS, data flow and so on in HDFS.
  • Good knowledge and experience in setting up Hadoop clusters in different distributions.
  • Strong knowledge on Hadoop eco systems including HDFS, Hive, Oozie, HBase, Pig, Sqoop, Zookeeper etc.
  • Experience on Administering and Monitoring of HadoopCluster like commissioning and decommissioning of nodes, file system check, Cluster maintenance, upgrades etc.
  • Experience on Cloudera, MapRand also Confidential distribution.
  • Successfully migrated the Hadoop cluster from one distribution to other.
  • Good Experience of Hadoop YARN which is Hadoop cluster resource management system and more popular these days.
  • Good Experience on importing and exporting the data from HDFS and Hive into Relational Database Systems like MySQL and vice versa using Sqoop.
  • Experience on running Oozie jobs daily, weekly or bi - monthly as needed for the business which will run in MapReduce way.
  • Experience on ETL and data visualization tool Pentaho data Integration, created jobs and transformations which makes analysis and some operations easier.
  • Good knowledge on NoSQL Databases including HBase, MongoDB, MapR-DB.
  • Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
  • Installations of Nagios, Ganglia open source tools on different environments.
  • Involved in maintaining and analyzing large data sets of memory in Terabytes efficiently.
  • Succeeded in running Spark on YARN cluster mode which can make performance faster.
  • Installation and configuration of Pentaho Data Integrationin different environments.
  • Executed complex HiveQL queries for required data extraction from Hive tables and written Hive UDF’s as required.
  • Monitoring Map Reduce jobs and YARN applications.
  • Good knowledge on Apache Solr which is used as search engine in different distributions.
  • Extensive experience on Object Oriented Analysis and Design, JAVA/J2EE technologies, Web services.
  • Extensive experience in working with Oracle, MS SQL Server, DB2, MySQL.
  • Experience in SDLC, Agile and also Hybrid Methodology.
  • Experience in working withHealth Care, Banking and Telecom industries.
  • Ability to meet deadlines without comprising in delivering right output.
  • Excellent Communication skills, Interpersonal skills, problem solving skills and a team player.
  • Ability to quickly adapt new environment and technologies.

TECHNICAL SKILLS:

Big Data Technologies: HDFS, Hadoop, Hive, Pig, Oozie, Zookeeper, Impala, Sqoop, MapReduce, Tez, Spark, Flume, HBase, MongoDB, Solr, Kafka, YARN, Avro, Storm

Distributions: Cloudera, MapR, Confidential Big Insights, Hortonworks

Java Technologies: Core JAVA, JSP, Servlets, Spring, Hibernate

Monitoring Tools: Cloudera Manager, Ambari, MapR Control System, Confidential Platform Cluster Manager

Programming Languages: JAVA, SQL,Scala,PigLatin, HiveQL, Shell Scripting,

Databases: NoSQL (HBase, MapR-DB, Cassandra, MongoDB), Oracle 12c/11g, MySQL, DB2, MS SQL Server

Operating Systems: Windows, Linux (RHEL, CentOS, Ubuntu)

ETL Tools: Tableau, Pentaho, Talend, Informatica, Datastage

Testing Methodologies: JUnit, MRUnit

Software, Tools& Other Technologies: Eclipse, Putty, Cygwin, Hue, JIRA, IntelliJIDEA, NetBeans, Maven, Log4J, Jenkins GitHub

PROFESSIONAL EXPERIENCE:

Confidential, San Francisco, CA

Big Data Engineer

Responsibilities:

  • Involved in developmentof Hadoop System and improving multi-node Hadoop Cluster performance.
  • Involved in Hadoop cluster administration and successful in maintenance of large volumes of storage.
  • Experience working with Talend, Hadoop ETL.
  • Experience on Spark for Click stream analysis.
  • Involved in developing the Spark Streaming jobs by writingRDD’s and developing data frame using SparkSQL as needed.
  • Involved in running the Oozie jobs daily, weekly, bi-monthly as required to know about the MapR-FS storage and for capacity planning.
  • Developed MapReduce jobs for data cleaning and manipulation.
  • Involved in defining job flows and running data streaming jobs to process terabytes of data.
  • Involved in managing and reviewing Hadoop log files.
  • Implemented partitioning and bucketing of data in Hive for fast performance.
  • Developed the external tables in Hivewhich can be used for obtaining required data for analysis by writing HiveQL queries.
  • Created the tables in Hive and written data in using Talend hive components.
  • Experiencein administering the cluster, commissioning and decommissioning of data nodes, backup and recovery, cluster performance and maintaining the healthy cluster in MapR distribution which uses MCS for cluster monitoring.
  • Experience working with Sqoop to transfer data between the MapR-FS to relational database like MySQL and vice versa and experience in using of Talend for this purpose.
  • Involved in installation of Nagios and Ganglia which is tool for provisioning and monitoring the Hadoop cluster and viewing the health of a cluster.
  • Created PigLatin Scripts to extract the required data from the large data lake.
  • Used Apache Spark on YARNto have fast large scale data processing and to increase performance.
  • Used Maven to build and manage projects of Spark and MapReduce which are in Scala and JAVA.
  • Experience on Drill which can deliver secure, interactive SQL analytics at petabyte scale, most popular SQL engine for big data.
  • Able to tackle the problems and accomplished the tasks which should be done during the sprint.

Environment: MapR-FS, M4&5, MCS, Maven, Hue, MapReduce, Hive, Pig, Sqoop, Kafka, Talend,Spark,SparkSQL Zookeeper,Oozie, HBase, MapR-DB,PentahoDI, Java, Scala, Eclipse, Linux

Confidential, Dallas, TX

Big Data Software Developer

Responsibilities:

  • Coordinated with Hadoop Administrator for administering Hadoop system whichinclude commissioning and decommissioning data nodes, cluster performance, maintaining cluster health, monitoring the system in web console etc. in Confidential BigInsights distribution.
  • Worked on with importing and exporting data from different Relational Database Systems likeDB2 into HDFS and Hive and vice-versa, using Sqoop.
  • Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
  • Developed multiple MapReduce jobs for data cleaning and preprocessing.
  • Developed automatic job flows and ran through Ooziedaily and when needed which runs MapReduce jobs internally.
  • Written HiveQL queries on the hive table and developed external tables required for ETLand generated reports from the data which are very useful for analysis.
  • Created jobs and transformation in Pentaho Data Integration to generate reports and transfer data from HBase to RDBMS.
  • Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.
  • Experience working on processing unstructured data using Pig and Hive.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Worked on Apache Solr which is used as indexing and search engine.
  • Experience on Big-SQL which is interactive SQL engine with low latency and which is very useful for business.
  • Actively involved in code review, troubleshooting the issues and bug fixing for improving the performance.

Environment: Hadoop, MapReduce, Hive, HDFS, Pig, IBMBigInsights V2.x, Sqoop, Kafka,Spark, Lucene, PentahoOozie, HBase, Big SQL, JAVA and Red Hat Enterprise Linux,Data stage.

Confidential, Kansas City, MO

Hadoop Developer

Responsibilities:

  • Worked on analyzing and transforming the data using different big data analytic tools including Hive, and MapReduce.
  • Involved in increasing the performance of system by adding other real time components like Flume, Storm to the platform.
  • Installed and configured Storm, Flume, Zookeeper, Ganglia and Nagios on the Hadoop cluster.
  • Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
  • Experience in providing security for the system using Kerberos.
  • Developed Map ReducePrograms in JAVA for data analysis and data cleaning.
  • Involved in defining job flows, running data streaming jobs to process terabytes of text data.
  • Worked with Apache Crunch library to write, test and run MapReduce pipeline jobs.
  • Developed PIG Latin scripts for the analysis of semi structured data.
  • Continuous monitoring and provisioning of Hadoop cluster through Cloudera Manager.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Worked on Impala for obtaining fast results without any transformation of data.
  • Worked on Kafka and Storm to ingest the real time data streams, to push the data to appropriate HDFS or HBase.
  • Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
  • Used Tableau for visualizing and analyzing the data.
  • Experience on using Solr search engine which can be used for indexing and searching the data.

Environment: Hadoop, MapReduce, Hive, HDFS, Pig, CDH4.x, Sqoop, Kafka, Storm, Oozie, HBase, Cloudera Manager,Crunch, Tableau,Linux.

Confidential, Milpitas, CA

Hadoop Developer

Responsibilities:

  • Introduced and developed architecture for a data platform service based on Apache open Source Hadoop eco system with HDFS, Solr, Impala, Hive to ingest, store, index and analyze big data.
  • Evaluated NoSQL data store solutions and delivered recommendations.
  • Migrated the data from traditional database to NOSQL, MongoDB to analyze the influx of data using Hadoop ecosystem tools to optimize business processes
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Migration of ETL processes from RDBMS to Hive to test the easy data manipulation.
  • Involved in running MapReduce jobs on MongoDB data and return results back to MongoDB.
  • Experience in migrating the data from HDFS to MongoDB.
  • Good understanding of choosing and deciding on NoSQL databases for Hadoop.

Environment: Hadoop, MapReduce, Hive, HDFS, Pig, CDH3.x, Zookeeper Sqoop, Oozie, MongoDB, Cloudera Manager, Linux.

We'd love your feedback!