Senior Hadoop Developer Resume
Charlotte, NC
SUMMARY
- 7+ years of IT industry experience with 5 years of experience in dealing with ApacheHadoop components like HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Oozie, Zookeeper, HBase, Cassandra, MongoDB and Amazon Web Services.
- 3+ years of experience in the Application Development and Maintenance of SDLC projects using Java technologies.
- Good experience working with Hortonworks Distribution, Cloudera Distribution and MapR Distribution
- Very good understanding/knowledge ofHadoopArchitecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Secondary Namenode, and MapReduce concepts.
- Developed applications for Distributed Environment using Hadoop, Mapreduce and Python.
- Experience in data extraction and transformation using MapReduce jobs.
- Proficient in working with Hadoop, HDFS, writing PIG scripts and Sqoop scripts.
- Performed data analysis using Hive and Pig.
- Expert in creating Pig and Hive UDFs using Java in order to analyze the data efficiently.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice - versa.
- Strong understanding of NoSQL databases like HBase, MongoDB & Cassandra.
- Strong understanding of Spark real time streaming and SparkSQL and experience in loading data from external data sources like MySQL and Cassandra for Spark applications.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Well versed with job workflow scheduling and monitoring tools like Oozie
- Developed MapReduce jobs to automate transfer of data from HBase.
- Practical knowledge on implementing Kafka with third-party systems, such as Spark and Hadoop.
- Loaded streaming log data from various webservers into HDFS using Flume.
- Experience in using Sqoop, Oozie and Cloudera Manager.
- Hands on experience in application development using RDBMS, and Linux shell scripting.
- Have experience with working on Amazon EMR and EC2 Spot instances
- Experience in integrating Hadoop with Ganglia and have good understanding of Hadoop metrics and visualization using Ganglia.
- Support development, testing, and operations teams during new system deployments.
- Extensively worked with Unified Modeling Tools (UML) in designing Use Cases, Activity flow diagram, Class diagrams, Sequence and Object Diagrams using Rational Rose, MS-Visio.
- Very good experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
- Hands on experience in Tableau to generate Hadoop data report.
- Good team player and can work efficiently in multiple team environments and multiple products. Easily adaptable to the new systems and environments.
- Possess excellent communication and analytical skills along with a can - do attitude.
TECHNICAL SKILLS
Programming languages: C, C++, Java, Python, Scala, R, Linux shell scripts
HADOOP/BIG DATA: MapReduce, Spark, SparkSQL, PySpark, SparkR, Pig, Hive, Sqoop, Hbase, Flume, Kafka Cassandra, Yarn, Oozie, Zookeeper
Databases: MySQL, PL/SQL, Mongo DB, HBase, Cassandra. Oracle 9i/10g/11g, MySQL, Netezza and Teradata
Operating Systems: Windows, Unix, Linux, Ubuntu.
Web Development: HTML, JSP, JavaScript, JQuery, CSS, XML, AJAX.
Reporting Tools: Tableau
Web/Application Servers: Apache Tomcat, Sun Java Application Server
IDE Tools: IntelliJ, Eclipse, NetBeans
Scripting: BASH, JavaScript
Version Controls: GIT, SVN
Cloud Services: Amazon Web Services
Monitoring Tools: Nagios, Ganglia
Build Tools: Maven
PROFESSIONAL EXPERIENCE
Senior Hadoop Developer
Confidential, Charlotte, NC
Responsibilities:
- Assess current and future ingestion requirements, review data sources, data formats and recommend processes for loading data into Hadoop.
- Developed ETL Applications using HIVE, SPARK, IMPALA & SQOOP and Automated using Oozie workflows and Shell scripts with error handling Systems and scheduled using Autosys.
- Built Sqoop jobs to import massive amounts of data from relational databases - Teradata & Netezza -and back-populate on Hadoop platform.
- Working on creating a common workflow to convert from EBCDIC format to ASCII from the Mainframe sources to a delimited file in the Avro format to HDFS.
- Worked on Avro and Parquet File Formats with snappy compression.
- Developing scripts to perform business transformations on the data using Hive and PIG.
- Creating Impala views on top of Hive tables for faster access to analyze data through HUE/TOAD.
- Connected Impala with different BI tools like TOAD and Sql Assistant to help modeling team to run the different RISK models.
- POC on Spark, working on Spark programs using Scala and Spark SQL for developing Business Reports.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala.
- Developing Bteq scripts for moving data from staging table to final tables in Teradata as part of automation.
- Support architecture, design review, code review, and best practices to implement a Hadoop architecture.
Environment: CDH 4, CDH5, HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, Impala, Spark, Kafka, Teradata, LinuxJava, Eclipse, SQL Assistant, TOAD, Hadoop 2.7.3, Java 1.7, Spark 1.6.0, SparkSQL, Python, Scala 2.10.5, MongoDB, Apache Pig 0.12.0, Apache Hive 1.1.0, HDFS, Sqoop, Maven, IntelliJ, UNIX Shell scripting, Oracle 11g/10g, Linux, SVN
Senior Hadoop Developer
Confidential
Responsibilities:
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa
- Knowledge in converting Hive or SQL queries into Spark transformations using Python and Scala.
- Experience in using Sequence files, RCFile, ORC, AVRO file formats; Managing and reviewing Hadooplog files.
- Experience in creating Sqoop jobs with incremental load to populate Hive External tables.
- Generate a Spark-Scala application to generate a daily monitoring report which tracks the status of file ingestions and schedule run in the datalake.
- Experience in working with Tivoli Work Scheduler and good knowledge on composing and scheduling several jobs.
- Experience in migrating a Perl, Python and Shell scripts to Spark-Scala code to improve the performance.
- Experience in parsing data in custom format to CSV format. Also good experience in cleaning such data using Spark jobs
- Good experience in writing shell scripts to support additional functionality for the application.
- Good experience in storing intermediate data and metadata in SQL, to track the status of ingestion in datalake
- Developed Spark, Scala and shell application to track the small files in the cluster and merge them.
- Created a statistics report that depicts the number of small files merged and current memory space occupied.
- Experience in using SVN along with tortoise as a code repository.
- Strong expertise in internal and external tables of HIVE and created Hive tables to store the processed results in a tabular format.
- Firm knowledge on HDFS commands to perform basic to advanced activities.
- Provide production support to resolve the issues for the applications deployed
- Strong knowledge on the process of creating complex data pipelines using transformations, aggregations, cleansing and filtering
- Experience in writing TWS schedules to run at regular intervals.
- Knowledge on incremental import, free-form query import, export and Hadoop ecosystem integration using Sqoop.
- Experience in working with Hortonworks Distribution.
- Good understanding of Partitioning, Bucketing, Join optimizations and query optimizations in Hive.
- Developed Json Parser, which converts the Json files to flat files using Spark, Scala.
- Good experience in communicating with off-shore team with daily status calls.
- Experience in dealing with production issues and good exposure to telecom domain.
- Practical experience in developing Spark applications in IntelliJ with Maven.
- Involved in loading data from UNIX file system to HDFS and developed UNIX scripts for job scheduling, process management and for handling logs from Hadoop.
- Developed test cases for Unit testing and performed integration and system testing.
- Developed applications using ATT proprietary software by coordinating with various teams both off-shore and on-shore
Environment: Hadoop 2.7.3, Java 1.7, Spark 1.6.0, SparkSQL, Python, Scala 2.10.5, MongoDB, Apache Pig 0.12.0, Apache Hive 1.1.0, HDFS, Sqoop, Maven, IntelliJ, UNIX Shell scripting, Oracle 11g/10g, Linux, SVN
Senior Hadoop Developer
Confidential
Responsibilities:
- Developing Spark core and Spark SQL scripts using Scala for faster data processing.
- Developing Sqoop Scripts for One-time Data Load from Oracle DB into Hive.
- Developing SparkSQL ETL code to pull Incremental Data (Inserts and Updates to the existing records) from Oracle and storing the data in Hive.
- Using big data processing tools like Hive, Spark Core, Spark SQL for batch processing large data sets on Hadoop cluster.
- Involving in improving the Spark job performance by properly allocating the available cluster resources to the job.
- Developing simple to complex Map Reduce Jobs using Hive and Pig
- Migrating various Hive UDF's and queries into Spark SQL for generating data analysis reports with increased performance.
- Loading data into Spark RDD for in- memory data Computation to generate the Output response.
- Observing the Spark UI Graph to analyze the job while it is running.
- Trouble shooting the Spark jobs looking at the log files available in the Spark Job Browser.
- Optimizing the Spark code to repartition the data as part of improving the Spark Job performance.
- Developed the shell script to load data from Teradata into hive tables using TDCH.
- Involving in Code Review sessions with the team and discussing different Design patterns as part of Scala Code Optimization.
- Working on developing a REST API with Spring Boot.
- Implemented the data layer for using MySql database and came up with JSON structure to use minimizing the dependency and redundancy.
- Developed Mockito test cases using MockitiJunitRunner following a test driven methodology.
- Integrated the JSON Objects, DB Objects, and business logic through faster xml Jackson Oject Mapper.
- Implemented different levels of logging using log4j2 logger.
- Incorporated error handling through Exception handling in different layers and generating custom exceptions and codes.
- Participating in Daily Stand ups, Sprint planning and Review meetings.
Environment: Hadoop, HDFS, Hive, Scala, SparkSQL, UNIX Shell Scripting, Oracle, Teradata, log4j2, MySQL, SpringBoot, Java, Maven, Windows, Eclipse
Senior Hadoop Developer
Confidential, Alpharetta, GA
Responsibilities:
- Involved in defining job flows, managing and reviewing log files.
- Supported Map Reduce Programs those are running on the cluster.
- As a Big Data Developer, implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, Hive, Oozie, Flume, Kafka, Sqoop etc.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Imported Bulk Data into HBase Using Map Reduce programs.
- Developed and written Apache PIG scripts and HIVE scripts to process the HDFS data.
- Perform analytics on Time Series Data exists in HBase using HBase API.
- Designed and implemented Incremental Imports into Hive tables.
- Involvedin collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved with File Processing using Pig Latin.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
- Experience in optimization of Map reduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFS cluster.
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Worked on debugging, performance tuning of Hive & Pig Jobs
- Used Hive to find correlations between customer's browser logs in different sites and analyzed them to build risk profile for such sites.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
Environment: Java, Hadoop 2.1.0, Map Reduce2, Pig 0.12.0, Hive 0.13.0, Linux, Sqoop 1.4.2, Flume 1.3.1, Eclipse, AWS EC2, and Cloudera CDH 4.
Senior Hadoop Developer
Confidential
Responsibilities:
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Good knowledge on implementing Image processing with Spark
- Experience in building batch and streaming applications with Apache Spark and Python.
- Experience in tackling parallel computing to support the Spark Machine Learning Applications.
- Experience in deploying machine learning algorithms and models and scale them for real-time events.
- Experienced in running Apache Pig Scripts to convert XML data to JSON data.
- Used Scala extensively for the processing and for extracting the images.
- Good knowledge on Dimensionality Reduction techniques in Mlib in Scala and Java
- Understanding of matPlotlib library for displaying images and experience in extracting images as vectors.
- Experience with Java Abstract Window Toolkit (AWT) which is used for basic image processing functions.
- Strong understanding of mapping, search queries, filters and validating queries in ElasticSearch application.
- Practical experience in defining queries on JSON data using Query DSL provided by ElasticSearch.
- Experience in improving the search focus and quality in ElasticSearch by using aggregations and Python scripts.
- Analyzed the data by performing Hive queries and running Pig scripts.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Experience in optimizing an Hbase cluster using different Hadoop and Hbase parameters.
- Good knowledge on Hbase data model and its operations along with various troubleshooting and maintenance techniques
- Good understanding of data storage, replication, data scanning and data filtration in Hbase.
- Experience in reading from and writing data to Amazon S3 in Spark Applications.
- Experience in selecting and configuring the right Amazon EC2 instances and access key AWS services using client tools and AWS SDKs.
- Knowledge on using AWS identity and Access Management to secure access to EC2 instances and configure auto-scaling groups using CloudWatch.
- Good understanding of the internals of Kafka design, message compression and replication.
- Experience in maintaining and operating Kafka and monitor it consistently and effectively using cluster management tools.
- Experience in integrating Kafka with other tools for logging and packaging.
- Experience in transferring data between HDFS and RDBMS using Sqoop.
- Knowledge on adding and describing a third party connector in Sqoop
- Knowledge on incremental import, free-form query import, export and Hadoop ecosystem integration using Sqoop.
- Run machine learning Spark jobs on Hadoop using Oozie and create quick Oozie jobs using Hue.
- Schedule Sqoop jobs through Oozie to import data from database to HDFS.
Environment: Amazon Web Services, Java 7, Hadoop 2.4.0, Spark, MLib, Python, Scala, Hbase, ElasticSearch, Apache Pig 0.12.0, Apache Hive 0.13.0, MapReduce, HDFS, Sqoop, Oozie, Kafka, Zookeeper, Maven, Eclipse, Nagios, Ganglia, GIT, UNIX Shell scripting, Oracle 11g/10g, Linux, Agile development.
