Senior Hadoop Developer Resume
Union, NJ
SUMMARY:
- About8+years of experience in IT industryincluding technical proficiency in Big data environment with extensive expertise in development on Hadoop ecosystemand Java.
- Extensive experience in Hadoop platform components MapReduce (MRv1,YARN, Hive, Pig, Scoop,Oozie, Hbase, Spark, Spark streaming, Spark SQL, Elastic search, Scala.
- Experience with working on cloud infrastructure Amazon Web Services (AWS).
- Extensive knowledge on MongoDB concepts and good knowledge in administration.
- Good experience in developing and implementing Sparkand its Streaming functionality using Scalaand Pythonto work with Real Time Data.
- Proficient in writing Map Reduce Programs and using Apache Hadoop JavaAPI for analyzing the structured and unstructured data.
- Extensive experiencein fine tuning, improving the performance and optimization of the Spark and Spark Streaming Jobs.
- Worked on replacing MR jobs and Hive scripts with Spark SQL and Spark data transformations for efficient data processing.
- Hands on experience on working complex MapReduce programs into Spark RDDoperations like transformations and actions.
- Worked on loading PARQUET/TXT files in Spark Framework using Java/Scala language and created Spark Data frame and RDD to process the data and save the file in parquetformat in HDFS to load into fact table using ORC Reader.
- Have knowledge on Apache Spark with Cassandra.
- Monitoring the Data Streaming (DS) between web sources and HDFS (Hadoop Distributed File System).
- Installation, configuration, management, supporting and monitoring Hadoop cluster using various distributions such as Apache Spark, Cloudera and AWS service console.
- Development of Spark Streaming Consumer Application integrating Kafka.
- Good understanding and knowledge of Hadoop architecture and Hands on experience with Hadoop components such asName Node, Data Node and Map Reduce concepts, Spark Execution Concepts and HDFS Framework.
- Familiar with MongoDB clusters, Java scripting to load unstructured data into sharding environment.
- Used Apache Kafka to aggregate log data from multiple servers and make them available in Downstream systems for analysis using spark streaming.
- Involved in designing various stages of migrating stages from RDBMS to Cassandra.
- Experience in launching EMRcluster, Redshiftcluster, EC2 instances, Amazon Data Pipeline,SimpleWorkflowServices.
- Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing Partitioningand Bucketing, writing and optimizing the HiveQL queries.
- Experience in writing Pig Latin scripts to sort, group, join and filter the data.
- Experience in writing UDF’S in java for hive and pig.
- Successfully generated consumer lag groups from Kafka using their API.
- Hands on experience in setting up workflows using Apache Oozie workflow engine for managing and scheduling Hadoop jobs.
- Strong knowledge in NOSQLcolumn-oriented databases like Cassandra, MongoDB and its integration with Hadoop cluster. Working experience on HbaseandElastic Search.
- Good Knowledge on Object Oriented Analysis and Design (OOAD) and Java Design patterns and good level of expertise in Core Java.
- Comprehensive knowledge of Software Development Life Cycle, Agile methodology, coupled with excellent communication skills.
- Strong analytical and Problem-solving skills.
- Implementing Microservices in Scala along with Apache Kafka.
- Experience working in both team and individual environments. Always eager to learn new technologies and implement them in challenging environment.
- Team player with good Inter personnel skills, communication and presentation skills. Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines.
TECHNICAL SKILLS:
Hadoop Platform: MapReduce, Hive, Hbase, Pig, Sqoop, Oozie, Impala, Spark streaming, Spark SQL
NoSQL Databases: Hbase, MongoDB, Cassandra, Elastic Search
Programming: Core Java, SQL, Shell scripting, C, C++
AWS Hadoop Services: S3,EMR,SimpleWorkFlow,DataPipeline,Redshift Database
Operating Systems: Linux (RedHat, CentOS), Windows XP/7/8, Mac OS
NoSQL Databases: Cassandra,MongoDB,HBase,Bigtable,ElasticSearch
ETL: Pentaho Report Designer,Logstash
BI Tools: Tableau, Kibana
Hadoop platform Distributions: Hadoop,HDP,Cloudera,Hadoop Distribution CDH3, CDH4, CDH5, Pivotal HD(2.0), AWS, GCP
PROFESSIONAL EXPERIENCE:
Confidential, Union, NJ
Senior Hadoop Developer
Responsibilities:
- Involved in building the data engineeringplatform on AWS for ingesting and aggregating and visualizing streaming real-time data from multiple sources.
- Developed spark streaming jobs which streams the data from Kafka topics and performs transformations on the data.
- Worked extensively on spark framework using Scalato perform ETL operations.
- Involved in end to end development, testing and deployment of the spark jobs, doing performance tuning.
- Worked on developing parsers using Scala API for parsing the data from different sources and data formats such as Byte code, JSON, CSV.
- Designed and implemented by configuring Topics in new Kafka cluster in all environment.
- Worked extensively in optimizing and tuning the spark streaming applicationto have a real-time access to data.
- Managed Amazon Web Services (AWS)- ELB, EC2, S3, EMR and Cloud Watch.
- Worked on receiver approach, as well as direct stream approach for streaming real-time data from Kafka using Spark Streaming.
- Deployed EMR clusters on AWS.
- Installed Kafka manager for consumer lags and for monitoring Kafka metrics, also this has been used for adding topics, partitions etc.
- Involved in multiple code improvements resulting in significantly less processing time for a single streaming batch., optimizing the performance of the pipeline.
- Hands on experience on working with Amazon EMR framework transferring data to EC2 Server.
- Worked on developing a parser for converting the Network data in byte code format to Json format using Scala API.
- Developed automated scripts for provisioning of the clusters for Kafka, Zookeeper, Elastic Search.
Environment: Scala, Spark, Spark Streaming, Kafka, ElasticSearch, Zookeeper, Python, Java, Shell Scripting, AWS EMR.
Confidential, Tampa, FL
Hadoop Developer
Responsibilities:
- Involved in working on Spark SQL Code as an alternative approach for Faster Data Processing and better Performance.
- Proposed an automated system using Shell script for the Hadoop jobs deployment process.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts,involved in writing Hive, Pig scripts for complex transformations.
- Used Kafka functionality like distribution, partition, replicated commit log service by messaging systems by maintaining feeds and created applications, which monitors consumer lag within Apache Kafka clusters.
- Implemented Hive custom UDF’s to achieve comprehensive data analysis.
- Writing Oozie workflows to run multiple Hive, shell script and Pig jobs which run independently with time and data availability.
- Prepared pig scripts and spark sql to handle all the transformations specified in the S2TM’s and to handle SCD2 and SCD1 scenarios.
- Used Apache NiFi to implement a system to store, send and ingest data from hundreds of devices.
- Load the data into Spark RDD and caching to avoid shuffling, experienced with batch processing of data sources using Apache Spark.
- Experience on developing API and framework on YARN applications using Apache TEZ.
- Developed a system to monitor Agile teams and performed log analysis on ELK Stack.
- Experience in managing large-scale, geographically- distributed database systems, including relational (Oracle, SQL Server) and NOSQL (MongoDB, Cassandra) systems.
- Involved in ingesting data into IDW staging directly through Spark Sqoop to push data into HDFS.
- Handled installation, administration and configuration of ELK Stack on AWS and performed log analysis.
- Experience in developing custom processors in Apache NiFi.
- Designed a messaging system using Apache Kafka to send messages across teams.
- Used Shell scripting for automation of scripts.
- Worked on QA support activities, test data creation and Unit testing activities.
- Worked in Agile development approach.
Environment: HortonworksDataPlatform Hadoop Platform, Apache TEZ, HDFS, Kafka,Spark RDD, HBase, Hive, Java, Sqoop, Oracle, MySQL, Spark, Storm, NOSQL, Apache NiFi, ELK Stack.
Confidential, Omaha, NE
Hadoop Developer
Responsibilities:
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Worked on automation of delta feeds from, Teradata using Sqoop, also from FTP Servers to Hive.
- Involved in loading one of the largest tables (SCAN table) from Teradata to Hadoop using TPT utility.
- Implemented Spark using Spark SQL for faster testing and processing of data.
- Developed Spark code using Scala and Spark-SQL for faster testing and processing of data.
- Responsible for load, aggregate and move large amounts of log data using Flume.
- Worked on the core and Spark SQL modules of Spark extensively.
- Established custom MapReduces programs to analyze data and used Pig Latin to clean unwanted data.
- Strong skills on SQL, Hive, Impalato extract data from SQL server, Oracle and Hadoop databases.
- Involved in analyzing the existing BTEQ scripts on mainframes and implementing the same logic Hadoop.
- Created complex queries aggregating large datasets in Impala to perform data quality checks for the project.
- Involved in exporting data from Hadoop to Greenplum using GPload utility.
- Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side join’s.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Participated in requirement gathering from the Experts and Business Partners and converting the requirements into technical specifications
- Implemented daily workflow for extraction, processing and analysis of data with Oozie.
- Involved in loading data from LINUX file system to HDFS.
Environment: Hadoop (Cloudera, Pivotal HD), Teradata 13.0, Pig, Hive, Sqoop, Flume, MapReduce, HDFS, LINUX, Oozie,Spark, Impala.
Confidential, Boston, MA
Java-Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Worked on tuning the performance Pig queries.
- Experience working on processing unstructured data using Pig and Hive.
- Worked on evaluating complex business metrics in Pig, MapReduce.
- Created Hive scripts to process the data for analysis.
- Focused on programming different Java modules and integration.
- Implemented Java mail services for email notifications.
- Actively involved in design and developing of Java/JEE components.
Environment: Kafka, Data Pipeline, MapReduce (Java), Map-Reduce, Hive, Pig
Confidential
Java Developer
Responsibilities:
- Used Java, JSP, JSTL while enhancing the functionality and responsibility for creating database tables on DB2.
- Written JavaScript code for front end validation.
- Involved in various phases of Software development life cycle (SDLC) as requirement gathering, data modeling analysis, architecture design and development for the project.
- Worked on Java Messaging Services (JMS) for developing messaging services.
- Developed Server-Side services using Java concepts. Involved in core Java technologies, Multithreading and exceptional handling.
- Involved in developing Front-end applications which will interact the mainframe applications using J2C connectors.
- Used JDBC object relational mapping and persistence.
- Designed and implemented scalable, Restful and microservices-based back-end. The back-end is written in Java using Spring Boot for simplicity and scalability.
- Used Junit to develop test cases for performing Unit testing.
- Used JIRA as a bug reporting tool for updating the bug report.
- Developing new and maintaining existing functionality using SPRING MVC, Hibernate.
Environment: HTML, JavaScript, CSS, Servlets, JSP, XML, ANT, Soap, JIRA, Junit, Ajax, GIT
Confidential
Junior Java Developer
Responsibilities:
- Involved in gathering business requirements, analyzing the project and creating UML diagrams such as Use cases, class diagrams and flow charts.
- Developed front end using JSTL, JSP, HTML and JavaScript.
- Creating new and maintained existing web pages build in JSP and Servlets.
- Extensively worked on views, Stored procedures, triggers and SQL queries and for loading the data (Staging) to enhance and maintain the existing functionality.
- Coded and developed multi-tiered architecture in Java, J2EE, Servlets.
- Consumed Web Services (WSDL, SOAP, UDDI) from third party for authorized payments to/from customers.
- Developed Hibernate Mapping file (. hbm.xml) files for mapping declarations.
- Actively involved from the start of the project, gathering requirements to quality assurance testing.
- Writing/ Manipulating the database queries, stored procedures for Oracle9i.
Environment: Java JDK 1.5, Oracle, Java/J2EE, JSP, Web Logic Application Server, HTML, Servlets, UML, XML, WSDL, SOAP, UDDI.
