Hadoop Developer Resume Philadelphia, PA - Hire IT People

SUMMARY

7 years of experience in IT, which includes experience in Bigdata Technologies, Hadoopecosystem, Data Warehousing, SQL related technologies.
Extensive experience as Hadoop Developer and Big Data Analyst. Primary technical skills in HDFS, MapReduce, YARN, Hive, Sqoop, HBase, CA7,Flume, Oozie, Zookeeper.
Working experience with Big Data and Hadoop Distributed File System (HDFS). In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Namenode, Data node and MapReduce concepts.
Experience in working with MapReduce programs using Apache Hadoop for working with BigData analysis.
Hands on experience in working with Ecosystems like Hive, Sqoop, Spark, MapReduce, Flume, Oozie.
Knowledge on Scala language features - Language fundamentals, Classes, Objects, Traits, Collections, Case Classes, High Order Functions, Pattern Matching, Extractors etc.
Experience on Creating Internal and External tables and implementation of performance\ improvement techniques using partitioning tables, bucketing tables in Hive.
Developed the Sqoop scripts to import data from RDBMS to HIVE, RDBMS to HDFS and Export Data from HDFS to RDBMS.
Loading data into the HDFS from dynamically generated files, relational database management systems using Sqoop.
Experience handling different file formats like JSON, AVRO, ORC and Parquet.
Knowledge of job workflow scheduling and monitoring tools like Oozie.
Experience with RDBMS Databases such as MySQL, Oracle and expertise on Writing SQLs, HQLs.
Experience on Scripting languages like Bash, Unix Shell scripting and knowledge on Python
Expertise in preparing the test cases, documenting and performing unit testing and Integration.
Working experience on Data ingestion tools like Apache NiFi also data loading into Common Data Lake using HiveQLs.
Working experience to develop wrapper shell scripts to schedule the data loading using HiveQLs using batch scheduled jobs.
Experience in various phases of Agile Development such as Requirement, Analysis, Design, Development, and Unit Testing.
Developed SQOOP Scripts for importing large dataset from RDBMS to HDFS. Knowledge on Creating the UDFs in Java and Register them in PIG and HIVE.
Experience in dealing with Spark Streaming and Apache Kafka to fetch live stream data.
Developed Spark scripts by using Scala shell commands as per the requirement
Extensive experience in developing applications that perform Data Processing tasks using Teradata, Oracle, SQL Server and MySQL database

TECHNICAL SKILLS

Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka, Cassandra, Apache Spark, Spark Streaming, HBase, Flume, Impala

Hadoop Distribution: Cloudera, Horton Works, Apache, AWS

Languages: Java, SQL, PL/SQL, Python, Pig Latin, HiveQL, Scala, Regular Expressions

Web Technologies: HTML, CSS, JavaScript, XML, JSP, Restful, SOAP

Operating Systems: Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.

Build Automation tools: SBT, Ant, Maven

Version Control: GIT,GIT HUB, BitBucket

IDE &Build Tools, Design: Eclipse, Visual Studio, Net Beans, Junit, SQL Developer, MySQL Workbench

Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (HBase, MongoDB).

Operating Systems: Windows 98, 2000, XP, Windows 7,10, Mac OS, Unix, Linux

Cloud Technologies: MS Azure, Amazon Web Services, Microsoft CORE and ASP.Net CORE

PROFESSIONAL EXPERIENCE

Hadoop Developer

Confidential - Philadelphia, PA

Responsibilities:

Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
Developed Spark API to import data into HDFS from Teradata and created Hive tables.
Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
Involved in performance tuning of Hive from design, storage and query perspectives.
Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
Developed Spark scripts to import large files from Amazon S3 buckets.
Developed Spark core and Spark SQL scripts using Scala for faster data processing.
Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
Integrated Hive and Tableau Desktop reports and published to Tableau Server.
Developed shell scripts for running Hive scripts in Hive and Impala.
Orchestrated number of Sqoop and Hive scripts using Oozie workflow and scheduled using Oozie coordinator.
Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Environment: HDFS, Yarn, Map Reduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera.

Environment: HDFS, Yarn, Map Reduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, SparkSQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera

Hadoop/Big Data Developer

Confidential - Charlotte, NC

Responsibilities:

Responsible for architecting Hadoop clusters with CDH3 and involved in installation of CDH3 and up gradation to CDH4 from CDH3
Worked on creating Key space in Cassandra for saving the Spark Batch output
Worked on Spark application to compact the small files present into hive ecosystem to make it equivalent to block size of HDFS
Manage migration of on-perm servers to AWS by creating golden images for upload and deployment
Implemented the real time streaming ingestion using Kafka and Spark Streaming
Loaded data using Spark-streaming with Scala and Python
Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka and Scala
Experience in loading the data into Spark RDD and performing in-memory data computation to generate the output responses
Migrated complex map reduce programs into In-memory Spark processing using Transformations and actions
Developed full text search platform using NoSQL and Logstash Elastic Search engine, allowing for much faster, more scalable and more intuitive user searches
Developed the Sqoop scripts to make the interaction between Pig and MySQL Database
Worked on Performance Enhancement in Pig, Hive and HBase on multiple nodes
Worked with Distributed n-tier architecture and Client/Server architecture
Supported Map Reduce Programs those are running on the cluster and developed multiple Map Reduce jobs in Java for data cleaning and pre-processing
Developed MapReduce application using Hadoop, MapReduce programming and HBase
Evaluated usage of Oozie for Work Flow Orchestration and experienced in cluster coordination using Zookeeper
Developing ETL jobs with organization and project defined standards and processes
Experienced in enabling Kerberos authentication in ETL process
Implemented data access using Hibernate persistence framework
Design of GUI using Model View Controller Architecture (STRUTS Frame Work)
Integrated Spring DAO for data access using Hibernate and involved in the Development of Spring Framework Controller

Environment: Hadoop 2.X, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, HBase, Java, J2EE, Eclipse, HQL.

Sr.Hadoop/Spark Developer

Confidential - Charlotte, NC

Responsibilities:

Involved in the Complete Software development life cycle (SDLC) to develop the application.
Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and Map Reduce on EC2.
Worked with the Data Science team to gather requirements for various data mining projects.
Worked with different source data file formats like JSON, CSV, and TSV etc.
Experience in importing data from various data sources like MySQL and Netezza using Sqoop, SFTP, performed transformations using Hive, Pig and loaded data back into HDFS.
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.
Import and export data between the environments like MySQL, HDFS and deploying into productions.
Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
Involved in developing Impala scripts to do Adhoc queries.
Experience in Oozie workflow scheduler template to manage various jobs like Sqoop, MR, Pig, Hive, Shell scripts, etc.
Involved in importing and exporting data from HBase using Spark.
Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment.
Actively participating in the code reviews, meetings and solving any technical issues.

Environment: Apache Hadoop, AWS, EMR, EC2, S3, Horton works, Map Reduce, Hive, Pig, Sqoop, Apache Spark, Zookeeper, HBase, Java, Oozie, Oracle, MySQL, Netezza and UNIX Shell Scripting.

Hadoop Developer

Confidential

Responsibilities:

Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing
Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster
Importing and exporting data into HDFS and Hive using Sqoop
Experienced in defining job flows and managing and reviewing Hadoop log files
Load and transform large sets of structured, semi structured and unstructured data
Responsible to manage data coming from different sources and for implementing MongoDB to store and analyze unstructured data
Supported Map Reduce Programs those are running on the cluster and involved in loading data from UNIX file system to HDFS
Installed and configured Hive and written Hive UDFs
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way
Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data
Created HBase tables to store variable data formats of PII data coming from different portfolios
Load and transform large sets of structured, semi structured and unstructured data
Cluster coordination services through Zookeeper
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop
Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
Used Hibernate ORM framework with Spring framework for data persistence and transaction management and involved in templates and screens in HTML and JavaScript

Environment: Hadoop, HDFS, MapReduce, Pig, Sqoop, UNIX, HBase, Java, JavaScript, HTML

SQL/Java Developer

Confidential

Responsibilities:

Importing the data from the MySQL and Oracle into the HDFS using Sqoop.
Implemented CDH3 Hadoop cluster on Centos.
Worked on installing clusters, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
Importing the unstructured data into the HDFS using Flume.
Written Map Reduce java programs to analyze the log data for large-scale data sets.
Involved in creating Hive tables, loading and analyzing data using hive queries.
Involved in using HBase Java API on Java application.
Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
Developed Pig Latin scripts to extract the data from the output files to load into HDFS.
Responsible for managing data from multiple sources.
Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Java-based map-reduce.
Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.

Environment: Hadoop 1.0.0, Map Reduce, Hive, HBase, Flume, Sqoop, Pig, Zookeeper, Java, ETL, SQL, Centos.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

Philadelphia, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship