We provide IT Staff Augmentation Services!

Bigdata Engineer/ Hadoop Developer Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • Overall 5+ years of overall experience with strong emphasis on Design, Development, Implementation, Testing and Deployment of Software Applications.
  • Over 5+ years of comprehensive IT experience in BigData and Big DataAnalytics, Hadoop, HDFS, Map Reduce, YARN, Hadoop Ecosystem and ShellScripting.
  • Highly capable for processing large sets of Structured, Semi - structured and Unstructured datasets and supporting BigData applications.
  • Expertise in transferring data between a Hadoop ecosystem and structured data storage in a RDBMS such as MY SQL, Oracle, Teradata and DB2 using Sqoop.
  • Experience in ApacheSpark cluster and streams processing using Spark Streaming
  • Expertise in moving large amounts of log, streaming event data and Transactional data using Flume.
  • Experience in developing Map Reduce jobs in Java for data cleaning and pre-processing.
  • Expertise in writing PigLatin, Hive Scripts and extended their functionality using UserDefined Functions (UDF's).
  • AWS certified - AWS solution Architect associate
  • Good knowledge on Hadoop, Hbase, Hive, Pig Latin Scripts, MR, Sqoop, Flume, Hive QL.
  • Experience in analyzing data using Pig Latin, HiveQL and HBase.
  • Capturing data from existing databases that provide SQL interfaces using Sqoop.
  • Experience in importing and exporting data using Sqoop from HDFS to RelationalDatabase Systems and vice-versa.
  • Implemented Proofs of Concept on Hadoop stack and different big data analytic tools, migration from different databases (i.e Teradata, Oracle, MYSQL) to Hadoop.
  • Worked on NoSQL databases including HBase, Cassandra and MongoDB
  • Successfully loaded files to Hive and HDFS from MongoDB, HBase
  • Experience in configuring Hadoop Clusters and HDFS.
  • Expertise in handling structured arrangement of data within certain limits (Data Layout's) using Partitions and Bucketing in Hive.
  • Expertise in preparing interactive Data Visualization's using Tableau Software from different sources.
  • Hands on experience in developing workflows that execute MapReduce, Sqoop, Pig,Hive and Shellscripts using Oozie.
  • Experience working with Cloudera HueInterface and Impala.
  • Expertise in developing SQLqueries, Stored Procedures and excellent development experience with Agile Methodology.
  • Excellent leadership, interpersonal, problem solving and time management skills.
  • Excellent communication skills both Written (documentation) and Verbal (presentation).

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, and Zookeeper.

No SQL Databases: Hbase, Cassandra, MongoDB

Languages: C, Python, Java, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, R Programming

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

PROFESSIONAL EXPERIENCE

Confidential

Bigdata Engineer/ Hadoop Developer

Responsibilities:

  • Developed a process for Sqooping data from multiple sources like SQLServer, Oracle and Teradata.
  • Responsible for creation of mapping document from source fields to destination fields mapping.
  • Developed a shell script to create staging, landing tables with the same schema like the source and generate the properties which are used by Ooziejobs.
  • Developed Oozie workflow's for executing Sqoop and Hive actions.
  • Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Involved in building databaseModel, APIs and Views utilizing python, in order to build an interactive web based solution
  • Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
  • Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow.
  • Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table which specifies the execution times of each job.
  • Developed Hivescripts for performing transformation logic and also loading the data from staging zone to final landing zone.
  • Developed monitoring and notification tools using Python.
  • Worked on Parquet File format to get a better storage and performance for publish tables.
  • Involved in loading transactional data into HDFS using Flume for Fraud Analytics.
  • Developed Python utility to validate HDFS tables with source tables.
  • Designed and developed UDF'S to extend the functionality in both PIG and HIVE.
  • Import and Export of data using Sqoop between MySQL to HDFS on regular basis.
  • Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from python using Python-MySQL connector and MySQL dB package to retrieve information.
  • Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozieworkflows.
  • Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using SparkContext, Spark-SQL, PairRDD's, Spark YARN.
  • Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.

Environment: Hadoop, HDFS2.6.3, Hive1.0.1, HBase0.98.12.1, Zookeeper3.5.1, Oozie, Impala1.4.1, Java(jdk1.6), ClouderaCDH 3, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume1.6.0, Scala2.11.6, Spark1.5.0, Sqoop1.4.6, Python3.5.1.

Confidential

BigData/Hadoop Developer

Responsibilities:

  • Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
  • Developed Spark scripts by using Scala as per the requirement.
  • Load the data into SparkRDD and performed in-memory data computation to generate the output response.
  • Performed different types of transformations and actions on the RDD to meet the business requirements.
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analysing data.
  • Also worked on analysing Hadoop cluster and different bigdata analytic tools including Pig, HBase and Sqoop.
  • Involved in loading data from UNIX file system to HDFS.
  • Created HBase tables to store variable data formats of PII data coming from different portfolios.
  • Implemented best offer logic using Pig scripts and Pig UDFs.
  • Responsible to manage data coming from various sources.
  • Experience on loading and transforming of large sets of structured, semi structured and unstructured data.
  • Cluster coordination services through Zookeeper.
  • Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Analysed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Responsible for setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
  • Involved in managing and reviewing Hadoop log files.
  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Developing Scripts and Batch Job to schedule various Hadoop Program.
  • Responsible for writing Hive queries for data analysis to meet the business requirements.
  • Responsible for creating Hive tables and working on them using HiveQL.
  • Responsible for importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
  • Designed and implemented Map Reduce based large-scale parallel relation-learning system.
  • Involved in scheduling Oozie workflow engine to run multiple Hive jobs
  • Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
  • Importing the unstructured data into the HDFS using Flume.
  • Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
  • Involved in using HBase Java API on Java application.
  • Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
  • Hands on design and development of an application using Hive (UDF).
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
  • Provide support data analysts in running Pig and Hive queries.
  • Involved in HiveQL and Involved in Pig Latin.
  • Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
  • Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.

Confidential

BigData/ Hadoop Developer

Responsibilities:

  • Cluster monitoring and maintenance of the Cummins Cluster.
  • Maintain multiple Hadoop clusters (min 100 nodes), Hadoop ecosystems, third party software, and database(s) with updates/upgrades, performance tuning and monitoring.
  • Manage and support 'info works' the data ingestion and integration tool for the Datalake.
  • Support/Troubleshoot/Schedule jobs running in the Production cluster.
  • Resolve issues, answer questions, and provide support for users or clients on a day to day basis related to Hadoop and its ecosystem.
  • Manage and support the Teradata EDW including their client tools i.e. Teradata Studio and SQL Assistant connecting to the Data lake.
  • Install, configure, and operate Zookeeper, Pig, Sqoop, Hive, HBase, Kafka, and Spark for business needs. - Installed and configured Hadoop Map Reduce,
  • HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Experienced in dealing with AVRO, PARQUET & ORC files.
  • Worked on installing, deploying, maintaining and securing of nodes and multi node cluster.
  • Involved in Junit, maven, GitHub, gradle, easymock, Jenkins, Intellij
  • Developed data pipelines of Kafka with the help of Spark 1.6 and Scala - Expert in Machine learning and NLP
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.

Environment: Hadoop, MapReduce2.7.2, Hive2.0, Pig0.16, Sqoop2, Java, Oozie, HBase0.98.19, Kafka0.10.1.1, Spark2.0, Scala2.12.0, Eclipse, Linux, Oracle, Teradata.

Confidential

BigDaa/Hadoop Developer

Responsibilities:

  • Worked on Hortonworks-HDP 2.5distribution
  • Responsible for building-scalable distribution data solution using Hadoop
  • Involved in importing data from MicrosoftSQLServer, MySQL, Teradata. into HDFS using Sqoop.
  • Played a key role in dynamic partitioning and Bucketing of the data stored in Hive Metadata.
  • Writing HiveQL queries for integrating different tables for create views to produce result set.
  • Collected the log data from Web Servers and integrated into HDFS using Flume.
  • Experienced on loading and transforming of large sets of structed and unstructured data.
  • Used MapReduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats.
  • Written MapReduce programs to handle semi structed and un structed data like JSON, Avro data files and sequence files for log files.
  • Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
  • Involved in loading data into HBaseNoSQL database.
  • Building, Managing and scheduling Oozie workflows for end to end job processing
  • Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
  • Analyzing of Large volumes of structured data using SparkSQL.
  • Written shell script to execute HiveQL.
  • Used Spark as ETL tool
  • Written Automated shell scripts in Linux/Unix environment using bash.
  • Migrated HiveQL queries into SparkSQLto improve performance.
  • Extracted Real time feed using Spark streaming and convert to RDD and process data into Data Frame and load the data into HBase.
  • Experienced in using Data Stax Spark connector which is used to store the data into Cassandra databaseor get the data from Cassandra database.
  • Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.

Environment: Hortonworks, Hadoop, HDFS, Pig, Sqoop, Hive, Oozie, Zookeeper, NoSQL, HBase, Shell Scripting, Scala, Spark, SparkSQL.

Confidential

Hadoop Developer

Responsibilities:

  • Migrating existing java application into microservices using spring boot and spring cloud.
  • Working knowledge in different IDEs like Eclipse, Spring Tool Suite.
  • Working knowledge of using GIT, ANT/Maven for project dependency / build / deployment.
  • Developing simple and complex MapReduce programs in Java for Data Analysis on different data formats.
  • Developing Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
  • Working as a part of AWS build team.
  • Creating, configure and managing S3 bucket(storage).
  • Experience on AWS EC2, EMR, LAMBDA and Cloud Watch.
  • Importing the data from different sources like HDFS/HBase into Spark RDD.
  • Experiencing with batch processing of data sources using Apache Spark and Elastic search.
  • Experiencing in implementing Spark RDD transformations, actions to implement business analysis
  • Migrating Hive QL queries on structured into Spark QL to improve performance
  • Optimizing MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Working on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
  • Working on Data Serialization formats for converting Complex objects into sequence bits by using AVRO,
  • PARQUET, JSON, CSV formats.
  • Responsible for analyzing and cleansing raw data by performing Hive/Impala queries and running Pig scripts on data.
  • Administration, installing, upgrading and managing distributions of Hadoop, Hive, HBase.
  • Involved in performance of troubleshooting and tuning Hadoop clusters.
  • Creating Hive tables, loaded data and wrote Hive queries that run within the map.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time
  • Configure deployed and maintained multi-node Dev and Test Kafka Clusters.
  • Developing Spark scripts by using Python shell commands as per the requirement to read/write JSON files.

Environment: Hadoop, HDFS2.6.3, Hive1.0.1, HBase0.98.12.1, Zookeeper3.5.1, Oozie, Impala1.4.1, Java(jdk1.6), ClouderaCDH 3, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume1.6.0, Scala2.11.6, Spark1.5.0, Sqoop1.4.6, Python3.5.1.

We'd love your feedback!