Hadoop Developer Resume
Philadelphia, PA
SUMMARY
- 7 years of experience in IT, which includes experience in Bigdata Technologies, Hadoopecosystem, Data Warehousing, SQL related technologies.
- Extensive experience as Hadoop Developer and Big Data Analyst. Primary technical skills in HDFS, MapReduce, YARN, Hive, Sqoop, HBase, CA7,Flume, Oozie, Zookeeper.
- Working experience with Big Data and Hadoop Distributed File System (HDFS). In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Namenode, Data node and MapReduce concepts.
- Experience in working with MapReduce programs using Apache Hadoop for working with BigData analysis.
- Hands on experience in working with Ecosystems like Hive, Sqoop, Spark, MapReduce, Flume, Oozie.
- Knowledge on Scala language features - Language fundamentals, Classes, Objects, Traits, Collections, Case Classes, High Order Functions, Pattern Matching, Extractors etc.
- Experience on Creating Internal and External tables and implementation of performance\ improvement techniques using partitioning tables, bucketing tables in Hive.
- Developed the Sqoop scripts to import data from RDBMS to HIVE, RDBMS to HDFS and Export Data from HDFS to RDBMS.
- Loading data into the HDFS from dynamically generated files, relational database management systems using Sqoop.
- Experience handling different file formats like JSON, AVRO, ORC and Parquet.
- Knowledge of job workflow scheduling and monitoring tools like Oozie.
- Experience with RDBMS Databases such as MySQL, Oracle and expertise on Writing SQLs, HQLs.
- Experience on Scripting languages like Bash, Unix Shell scripting and knowledge on Python
- Expertise in preparing the test cases, documenting and performing unit testing and Integration.
- Working experience on Data ingestion tools like Apache NiFi also data loading into Common Data Lake using HiveQLs.
- Working experience to develop wrapper shell scripts to schedule the data loading using HiveQLs using batch scheduled jobs.
- Experience in various phases of Agile Development such as Requirement, Analysis, Design, Development, and Unit Testing.
- Developed SQOOP Scripts for importing large dataset from RDBMS to HDFS. Knowledge on Creating the UDFs in Java and Register them in PIG and HIVE.
- Experience in dealing with Spark Streaming and Apache Kafka to fetch live stream data.
- Developed Spark scripts by using Scala shell commands as per the requirement
- Extensive experience in developing applications that perform Data Processing tasks using Teradata, Oracle, SQL Server and MySQL database
TECHNICAL SKILLS
Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka, Cassandra, Apache Spark, Spark Streaming, HBase, Flume, Impala
Hadoop Distribution: Cloudera, Horton Works, Apache, AWS
Languages: Java, SQL, PL/SQL, Python, Pig Latin, HiveQL, Scala, Regular Expressions
Web Technologies: HTML, CSS, JavaScript, XML, JSP, Restful, SOAP
Operating Systems: Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS.
Build Automation tools: SBT, Ant, Maven
Version Control: GIT,GIT HUB, BitBucket
IDE &Build Tools, Design: Eclipse, Visual Studio, Net Beans, Junit, SQL Developer, MySQL Workbench
Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (HBase, MongoDB).
Operating Systems: Windows 98, 2000, XP, Windows 7,10, Mac OS, Unix, Linux
Cloud Technologies: MS Azure, Amazon Web Services, Microsoft CORE and ASP.Net CORE
PROFESSIONAL EXPERIENCE
Hadoop Developer
Confidential - Philadelphia, PA
Responsibilities:
- Involved in complete Big Data flow of the application starting from data ingestion from upstream to HDFS, processing and analyzing the data in HDFS.
- Developed Spark API to import data into HDFS from Teradata and created Hive tables.
- Developed Sqoop jobs to import data in Avro file format from Oracle database and created hive tables on top of it.
- Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
- Involved in running all the hive scripts through hive, Impala, Hive on Spark and some through Spark SQL.
- Involved in performance tuning of Hive from design, storage and query perspectives.
- Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
- Collected the Json data from HTTP Source and developed Spark APIs that helps to do inserts and updates in Hive tables.
- Developed Spark scripts to import large files from Amazon S3 buckets.
- Developed Spark core and Spark SQL scripts using Scala for faster data processing.
- Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
- Involved in designing and developing tables in HBase and storing aggregated data from Hive Table.
- Integrated Hive and Tableau Desktop reports and published to Tableau Server.
- Developed shell scripts for running Hive scripts in Hive and Impala.
- Orchestrated number of Sqoop and Hive scripts using Oozie workflow and scheduled using Oozie coordinator.
- Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Environment: HDFS, Yarn, Map Reduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera.
Environment: HDFS, Yarn, Map Reduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, SparkSQL, Spark Streaming, Eclipse, Oracle, Teradata, PL/SQL UNIX Shell Scripting, Cloudera
Hadoop/Big Data Developer
Confidential - Charlotte, NC
Responsibilities:
- Responsible for architecting Hadoop clusters with CDH3 and involved in installation of CDH3 and up gradation to CDH4 from CDH3
- Worked on creating Key space in Cassandra for saving the Spark Batch output
- Worked on Spark application to compact the small files present into hive ecosystem to make it equivalent to block size of HDFS
- Manage migration of on-perm servers to AWS by creating golden images for upload and deployment
- Implemented the real time streaming ingestion using Kafka and Spark Streaming
- Loaded data using Spark-streaming with Scala and Python
- Involved in requirement and design phase to implement Streaming Lambda Architecture to use real time streaming using Spark and Kafka and Scala
- Experience in loading the data into Spark RDD and performing in-memory data computation to generate the output responses
- Migrated complex map reduce programs into In-memory Spark processing using Transformations and actions
- Developed full text search platform using NoSQL and Logstash Elastic Search engine, allowing for much faster, more scalable and more intuitive user searches
- Developed the Sqoop scripts to make the interaction between Pig and MySQL Database
- Worked on Performance Enhancement in Pig, Hive and HBase on multiple nodes
- Worked with Distributed n-tier architecture and Client/Server architecture
- Supported Map Reduce Programs those are running on the cluster and developed multiple Map Reduce jobs in Java for data cleaning and pre-processing
- Developed MapReduce application using Hadoop, MapReduce programming and HBase
- Evaluated usage of Oozie for Work Flow Orchestration and experienced in cluster coordination using Zookeeper
- Developing ETL jobs with organization and project defined standards and processes
- Experienced in enabling Kerberos authentication in ETL process
- Implemented data access using Hibernate persistence framework
- Design of GUI using Model View Controller Architecture (STRUTS Frame Work)
- Integrated Spring DAO for data access using Hibernate and involved in the Development of Spring Framework Controller
Environment: Hadoop 2.X, HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, HBase, Java, J2EE, Eclipse, HQL.
Sr.Hadoop/Spark Developer
Confidential - Charlotte, NC
Responsibilities:
- Involved in the Complete Software development life cycle (SDLC) to develop the application.
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive and Map Reduce on EC2.
- Worked with the Data Science team to gather requirements for various data mining projects.
- Worked with different source data file formats like JSON, CSV, and TSV etc.
- Experience in importing data from various data sources like MySQL and Netezza using Sqoop, SFTP, performed transformations using Hive, Pig and loaded data back into HDFS.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce.
- Import and export data between the environments like MySQL, HDFS and deploying into productions.
- Worked on partitioning and used bucketing in HIVE tables and setting tuning parameters to improve the performance.
- Involved in developing Impala scripts to do Adhoc queries.
- Experience in Oozie workflow scheduler template to manage various jobs like Sqoop, MR, Pig, Hive, Shell scripts, etc.
- Involved in importing and exporting data from HBase using Spark.
- Involved in POC for migrating ETLS from Hive to Spark in Spark on Yarn Environment.
- Actively participating in the code reviews, meetings and solving any technical issues.
Environment: Apache Hadoop, AWS, EMR, EC2, S3, Horton works, Map Reduce, Hive, Pig, Sqoop, Apache Spark, Zookeeper, HBase, Java, Oozie, Oracle, MySQL, Netezza and UNIX Shell Scripting.
Hadoop Developer
Confidential
Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing
- Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster
- Importing and exporting data into HDFS and Hive using Sqoop
- Experienced in defining job flows and managing and reviewing Hadoop log files
- Load and transform large sets of structured, semi structured and unstructured data
- Responsible to manage data coming from different sources and for implementing MongoDB to store and analyze unstructured data
- Supported Map Reduce Programs those are running on the cluster and involved in loading data from UNIX file system to HDFS
- Installed and configured Hive and written Hive UDFs
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way
- Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data
- Created HBase tables to store variable data formats of PII data coming from different portfolios
- Load and transform large sets of structured, semi structured and unstructured data
- Cluster coordination services through Zookeeper
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
- Used Hibernate ORM framework with Spring framework for data persistence and transaction management and involved in templates and screens in HTML and JavaScript
Environment: Hadoop, HDFS, MapReduce, Pig, Sqoop, UNIX, HBase, Java, JavaScript, HTML
SQL/Java Developer
Confidential
Responsibilities:
- Importing the data from the MySQL and Oracle into the HDFS using Sqoop.
- Implemented CDH3 Hadoop cluster on Centos.
- Worked on installing clusters, commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
- Developing parser and loader map reduce application to retrieve data from HDFS and store to HBase and Hive.
- Importing the unstructured data into the HDFS using Flume.
- Written Map Reduce java programs to analyze the log data for large-scale data sets.
- Involved in creating Hive tables, loading and analyzing data using hive queries.
- Involved in using HBase Java API on Java application.
- Automated all the jobs for extracting the data from different Data Sources like MySQL to pushing the result set data to Hadoop Distributed File System.
- Developed Pig Latin scripts to extract the data from the output files to load into HDFS.
- Responsible for managing data from multiple sources.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
- Designed and built many applications to deal with vast amounts of data flowing through multiple Hadoop clusters, using Java-based map-reduce.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Environment: Hadoop 1.0.0, Map Reduce, Hive, HBase, Flume, Sqoop, Pig, Zookeeper, Java, ETL, SQL, Centos.