Bigdata Engineer/ Hadoop Developer Resume

SUMMARY

Overall 5+ years of overall experience with strong emphasis on Design, Development, Implementation, Testing and Deployment of Software Applications.
Over 5+ years of comprehensive IT experience in BigData and Big DataAnalytics, Hadoop, HDFS, Map Reduce, YARN, Hadoop Ecosystem and ShellScripting.
Highly capable for processing large sets of Structured, Semi - structured and Unstructured datasets and supporting BigData applications.
Expertise in transferring data between a Hadoop ecosystem and structured data storage in a RDBMS such as MY SQL, Oracle, Teradata and DB2 using Sqoop.
Experience in ApacheSpark cluster and streams processing using Spark Streaming
Expertise in moving large amounts of log, streaming event data and Transactional data using Flume.
Experience in developing Map Reduce jobs in Java for data cleaning and pre-processing.
Expertise in writing PigLatin, Hive Scripts and extended their functionality using UserDefined Functions (UDF's).
AWS certified - AWS solution Architect associate
Good knowledge on Hadoop, Hbase, Hive, Pig Latin Scripts, MR, Sqoop, Flume, Hive QL.
Experience in analyzing data using Pig Latin, HiveQL and HBase.
Capturing data from existing databases that provide SQL interfaces using Sqoop.
Experience in importing and exporting data using Sqoop from HDFS to RelationalDatabase Systems and vice-versa.
Implemented Proofs of Concept on Hadoop stack and different big data analytic tools, migration from different databases (i.e Teradata, Oracle, MYSQL) to Hadoop.
Worked on NoSQL databases including HBase, Cassandra and MongoDB
Successfully loaded files to Hive and HDFS from MongoDB, HBase
Experience in configuring Hadoop Clusters and HDFS.
Expertise in handling structured arrangement of data within certain limits (Data Layout's) using Partitions and Bucketing in Hive.
Expertise in preparing interactive Data Visualization's using Tableau Software from different sources.
Hands on experience in developing workflows that execute MapReduce, Sqoop, Pig,Hive and Shellscripts using Oozie.
Experience working with Cloudera HueInterface and Impala.
Expertise in developing SQLqueries, Stored Procedures and excellent development experience with Agile Methodology.
Excellent leadership, interpersonal, problem solving and time management skills.
Excellent communication skills both Written (documentation) and Verbal (presentation).

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, and Zookeeper.

No SQL Databases: Hbase, Cassandra, MongoDB

Languages: C, Python, Java, J2EE, PL/SQL, Pig Latin, HiveQL, Unix shell scripts, R Programming

Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB, JSF, JQuery

Frameworks: MVC, Struts, Spring, Hibernate

Operating Systems: Sun Solaris, HP-UNIX, RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8

Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP

Web/Application servers: Apache Tomcat, WebLogic, JBoss

Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata

Tools: and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD, DB Visualizer

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP

PROFESSIONAL EXPERIENCE

Confidential

Bigdata Engineer/ Hadoop Developer

Responsibilities:

Developed a process for Sqooping data from multiple sources like SQLServer, Oracle and Teradata.
Responsible for creation of mapping document from source fields to destination fields mapping.
Developed a shell script to create staging, landing tables with the same schema like the source and generate the properties which are used by Ooziejobs.
Developed Oozie workflow's for executing Sqoop and Hive actions.
Worked with NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
Involved in building databaseModel, APIs and Views utilizing python, in order to build an interactive web based solution
Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for the workflow.
Developed scripts to run Oozie workflows, capture the logs of all jobs that run on cluster and create a metadata table which specifies the execution times of each job.
Developed Hivescripts for performing transformation logic and also loading the data from staging zone to final landing zone.
Developed monitoring and notification tools using Python.
Worked on Parquet File format to get a better storage and performance for publish tables.
Involved in loading transactional data into HDFS using Flume for Fraud Analytics.
Developed Python utility to validate HDFS tables with source tables.
Designed and developed UDF'S to extend the functionality in both PIG and HIVE.
Import and Export of data using Sqoop between MySQL to HDFS on regular basis.
Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from python using Python-MySQL connector and MySQL dB package to retrieve information.
Automated all the jobs for pulling data from FTP server to load data into Hive tables using Oozieworkflows.
Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using SparkContext, Spark-SQL, PairRDD's, Spark YARN.
Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.

Environment: Hadoop, HDFS2.6.3, Hive1.0.1, HBase0.98.12.1, Zookeeper3.5.1, Oozie, Impala1.4.1, Java(jdk1.6), ClouderaCDH 3, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume1.6.0, Scala2.11.6, Spark1.5.0, Sqoop1.4.6, Python3.5.1.

Confidential

BigData/Hadoop Developer

Responsibilities:

Involved in end to end data processing like ingestion, processing, and quality checks and splitting.
Developed Spark scripts by using Scala as per the requirement.
Load the data into SparkRDD and performed in-memory data computation to generate the output response.
Performed different types of transformations and actions on the RDD to meet the business requirements.
Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analysing data.
Also worked on analysing Hadoop cluster and different bigdata analytic tools including Pig, HBase and Sqoop.
Involved in loading data from UNIX file system to HDFS.
Created HBase tables to store variable data formats of PII data coming from different portfolios.
Implemented best offer logic using Pig scripts and Pig UDFs.
Responsible to manage data coming from various sources.
Experience on loading and transforming of large sets of structured, semi structured and unstructured data.
Cluster coordination services through Zookeeper.
Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Analysed large amounts of data sets to determine optimal way to aggregate and report on it.
Responsible for setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Involved in managing and reviewing Hadoop log files.
Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
Developing Scripts and Batch Job to schedule various Hadoop Program.
Responsible for writing Hive queries for data analysis to meet the business requirements.
Responsible for creating Hive tables and working on them using HiveQL.
Responsible for importing and exporting data into HDFS and Hive using Sqoop.
Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
Designed and implemented Map Reduce based large-scale parallel relation-learning system.
Involved in scheduling Oozie workflow engine to run multiple Hive jobs
Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
Importing the unstructured data into the HDFS using Flume.
Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
Involved in using HBase Java API on Java application.
Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
Hands on design and development of an application using Hive (UDF).
Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
Provide support data analysts in running Pig and Hive queries.
Involved in HiveQL and Involved in Pig Latin.
Importing and exporting Data from MySQL/Oracle to HiveQL Using SQOOP.
Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.

Confidential

BigData/ Hadoop Developer

Responsibilities:

Cluster monitoring and maintenance of the Cummins Cluster.
Maintain multiple Hadoop clusters (min 100 nodes), Hadoop ecosystems, third party software, and database(s) with updates/upgrades, performance tuning and monitoring.
Manage and support 'info works' the data ingestion and integration tool for the Datalake.
Support/Troubleshoot/Schedule jobs running in the Production cluster.
Resolve issues, answer questions, and provide support for users or clients on a day to day basis related to Hadoop and its ecosystem.
Manage and support the Teradata EDW including their client tools i.e. Teradata Studio and SQL Assistant connecting to the Data lake.
Install, configure, and operate Zookeeper, Pig, Sqoop, Hive, HBase, Kafka, and Spark for business needs. - Installed and configured Hadoop Map Reduce,
HDFS, developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
Experienced in dealing with AVRO, PARQUET & ORC files.
Worked on installing, deploying, maintaining and securing of nodes and multi node cluster.
Involved in Junit, maven, GitHub, gradle, easymock, Jenkins, Intellij
Developed data pipelines of Kafka with the help of Spark 1.6 and Scala - Expert in Machine learning and NLP
Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.

Environment: Hadoop, MapReduce2.7.2, Hive2.0, Pig0.16, Sqoop2, Java, Oozie, HBase0.98.19, Kafka0.10.1.1, Spark2.0, Scala2.12.0, Eclipse, Linux, Oracle, Teradata.

Confidential

BigDaa/Hadoop Developer

Responsibilities:

Worked on Hortonworks-HDP 2.5distribution
Responsible for building-scalable distribution data solution using Hadoop
Involved in importing data from MicrosoftSQLServer, MySQL, Teradata. into HDFS using Sqoop.
Played a key role in dynamic partitioning and Bucketing of the data stored in Hive Metadata.
Writing HiveQL queries for integrating different tables for create views to produce result set.
Collected the log data from Web Servers and integrated into HDFS using Flume.
Experienced on loading and transforming of large sets of structed and unstructured data.
Used MapReduce programs for data cleaning and transformations and load the output into the Hive tables in different file formats.
Written MapReduce programs to handle semi structed and un structed data like JSON, Avro data files and sequence files for log files.
Created data pipelines for different events to load the data from DynamoDB to AWS S3 bucket and then into HDFS location.
Involved in loading data into HBaseNoSQL database.
Building, Managing and scheduling Oozie workflows for end to end job processing
Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
Analyzing of Large volumes of structured data using SparkSQL.
Written shell script to execute HiveQL.
Used Spark as ETL tool
Written Automated shell scripts in Linux/Unix environment using bash.
Migrated HiveQL queries into SparkSQLto improve performance.
Extracted Real time feed using Spark streaming and convert to RDD and process data into Data Frame and load the data into HBase.
Experienced in using Data Stax Spark connector which is used to store the data into Cassandra databaseor get the data from Cassandra database.
Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra.

Environment: Hortonworks, Hadoop, HDFS, Pig, Sqoop, Hive, Oozie, Zookeeper, NoSQL, HBase, Shell Scripting, Scala, Spark, SparkSQL.

Confidential

Hadoop Developer

Responsibilities:

Migrating existing java application into microservices using spring boot and spring cloud.
Working knowledge in different IDEs like Eclipse, Spring Tool Suite.
Working knowledge of using GIT, ANT/Maven for project dependency / build / deployment.
Developing simple and complex MapReduce programs in Java for Data Analysis on different data formats.
Developing Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data
Working as a part of AWS build team.
Creating, configure and managing S3 bucket(storage).
Experience on AWS EC2, EMR, LAMBDA and Cloud Watch.
Importing the data from different sources like HDFS/HBase into Spark RDD.
Experiencing with batch processing of data sources using Apache Spark and Elastic search.
Experiencing in implementing Spark RDD transformations, actions to implement business analysis
Migrating Hive QL queries on structured into Spark QL to improve performance
Optimizing MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
Working on partitioning HIVE tables and running the scripts in parallel to reduce run-time of the scripts.
Working on Data Serialization formats for converting Complex objects into sequence bits by using AVRO,
PARQUET, JSON, CSV formats.
Responsible for analyzing and cleansing raw data by performing Hive/Impala queries and running Pig scripts on data.
Administration, installing, upgrading and managing distributions of Hadoop, Hive, HBase.
Involved in performance of troubleshooting and tuning Hadoop clusters.
Creating Hive tables, loaded data and wrote Hive queries that run within the map.
Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time
Configure deployed and maintained multi-node Dev and Test Kafka Clusters.
Developing Spark scripts by using Python shell commands as per the requirement to read/write JSON files.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship