Senior Hadoop Developer Resume
Iowa City, IA
SUMMARY:
- Talented and accomplished Software Engineer with 8 years of IT experience in developing applications using BigData, AWS, Java,SQL and Spark.
- 9+ years of experience with Big Data tools like MapReduce, YARN, HDFS, Hbase, Impala,Hive, Pig, Oozie,AWS,, ApacheSpark for ingestion, storage, querying, processing and analysis of data.
- Performance tuning in Hive&Impala using multiple methods limited to dynamic partitioning, bucketing, indexing, files compressions.
- Hands on experience withdata ingestion tools Kafka, Flume and workflow management tools Oozie and Zena.
- Hands on experience handling different file formats like JSON, AVRO, ORC, Parquet and compression techniques like snappy, zlib and lzo.
- Hands on experience in Hadoop Ecosystem components such as Hadoop, Spark, HDFS, YARN, TEZ, Hive, Sqoop, Flume, MapReduce, SCALA, Pig, OOZIE, Kafka, NIFI, Storm, HBASE.
- Experience on analyzing data in NOSQL databases like Hbase and Cassandraand its Integration with Hadoop cluster.
- Hands on experience with Spark Core, Spark SQL and Data Frames/Data Sets/RDD API.
- Experience in using Kafka and Kafka brokers to initiate spark context and processing live streaming information with the help of RDD.
- Developed Java applications using various IDE's like Spring Tool Suite and Eclipse.
- Good knowledge in using Hibernate for mapping Java classes with database and using Hibernate Query Language (HQL).
- Operated on Java/J2EE systems with different databases, which include Oracle, MySQL and DB2.
- Knowledge on implementing Big Data in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2 instances.
- Capable of processing large sets of structured, semi - structured and unstructureddata and supporting systems application architecture.
- Extensive development experience in sparkapplications for datatransformations and loading into HDFS using RDD, DataFrames and Datasets.
- Extensive knowledge on performance tuning of Spark applications and converting Hive/SQL queries into Sparktransformations.
- Hands-on experience with AWS (AmazonWebServices), using ElasticMapReduce (EMR), creating and storing data in S3buckets and creating ElasticLoadBalancers(ELB) for Hadoop front end WebUI's.
- Extensive knowledge on creating Hadoop cluster on multiple EC2 instances in AWS and configuring them through ambari and using IAM (Identity and AccessManagement) for creating groups, users and assigning permissions.
- Extensive programming experience in JavaCore concepts like OOPS, Multithreading, Collections and IO.
- Experience using Jira for ticketing issues and Jenkins for continuous integration.
- Extensive experience with UNIX commands, shellscripting and setting up CRON jobs.
- Experience in software configuration management using Git.
- Good experience in using Relational databases Oracle&MySQL.
- Able to assess businessrules, collaborate with stakeholders and perform source-to-target datamapping, design.
- Successfully working in fast-paced environment, both independently and in collaborative team environments.
TECHNICAL SKILLS:
Operating Systems: Win 95, 98, 2000/XP and UNIX, Linux
Languages: SQL, HTML, CSS, JAVASCRIPT, JAVA
Database: RDBMS, Oracle, DB2, SQL Server, MS Access, Database: MySQL, Oracle, PostgreSQL, MS Access, R Language, Hive, Spartan
Utilities: MS Word, Excel, Macros, Access, Power Point
Hadoop Technologies: HDFS, Hive, Pig, Scoop, Oozie, HDFS, Map Reduce, HBase
PROFESSIONAL EXPERIENCE:
Senior Hadoop Developer
Confidential - Iowa City, IA
Responsibilities:
- Strong understanding and practical experience in developing Spark applications with Scala.
- Developed Spark scripts by using Spark shell commands as per the requirement.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD in Spark for Data Aggregation.
- Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL, Data Frame and pair RDD's
- Experience in developing SparkSQL applications both using SQL and DSL
- Extensively worked with parquet file format and gained practical knowledge in writing spark and hive applications to meet the parquet requirements.
- Experience in using various compression techniques along with Parquet file format.
- Experience in managing datasets and gained good experience in creating the test datasets for development purpose
- Experience in building dimensional and fact tables using Spark Scala applications
- Practical knowledge on writing applications in Scala to interact with the Hive through the Spark application.
- Extensively used Hive partitioned tables, map join, bucketing and gained good understanding of dynamic partitioning.
- Performed POC on writing the spark applications in Scala, Python and R programming language
- Good hands on experience with Hive to perform data queries and analysis as a part of the QA
- Practical experience in using Pig to perform the QA by calculating the statistics of the final output.
- Experience in designing both time driven and data driven automated workflows using Oozie
- Experience in writing Sqoop scripts to import data from exadata to HDFS
- Good exposure to MongoDB, it's functionality and use-cases
- Gained good exposure to Hue interface for monitoring the job status, managing the HDFS files, tracking the scheduled jobs and managing the Oozie workflows
- Performed optimizations and performance tuning in Spark and Hive
- Developed Unix script to automate data load into HDFS
- Strong knowledge on HDFS commands to manage the files and also gained good understanding in managing the file system through the Spark Scala applications.
- Extensive usage of alias for Oozie and HDFS commands
- Experienced in managing and reviewing Hadoop log files.
- Experience in log controlling for Spark applications and extensive use of log4j to log the respective phases of the application accordingly
- Good knowledge on GIT commands, version tagging and pull requests
- Performed unit testing and also integration testing after the development and participated in code reviews.
- Experience in writing the Junit test cases for testing the Spark and SparkSQL applications
- Practical experience with developing applications in IntelliJ and Maven
- Good exposure to Agile environment. Participated in daily standups, Big Room Planning, Sprint meetings and Team Retrospectives
- Interact with business analysts to understand the business requirements and translate them to technical requirements
Environment: Hadoop 2.6.0-cdh5.7.0, Java 1.8.0 92, Spark 1.6.0, SparkSQL, R programming, Python, Scala 2.10.5, MongoDB, Apache Pig 0.12.0, Apache Hive 1.1.0, HDFS, Sqoop, Oozie, Maven, IntelliJ, GIT, UNIX Shell scripting, Oracle 11g/10g, Log4j, Linux, Agile development
Hadoop/Spark Developer
Confidential - Atlanta, GA
Responsibilities:
- Developed MapReduce jobs to process documents
- Responsible for SOLR implementation and setup collections in SolrCloud.
- Involved in Hadoop cluster setup and configuring Hadoop Ecosystems.
- Developing Scripts and Batch Job to schedule various Hadoop Program
- Write code to parse the external documents before copying to HDFS.
- Developed Spark scripts by using Scala as per the requirement.
- Developed HBase ingestion for documents and tuning
- Developed web application to interact with SOLR for searching documents and ingest using SOLRJ api
- Developed Spark jobs using Scala for processing locomotive events
- Responsible for interacting with business partners and gather requirements and prepare technical design documents.
- Developed service oriented architecture (SOA) based design of the application
- Responsible for writing detail design documents and class diagrams and sequence diagrams.
- Developed composite components using JSF 2.0.
- Coordinating with the Onsite team and Clients.
- Preparing the Unit Test Cases and executing the same.
- Involved in the Integration testing, User Acceptance Support.
- Involved in the Production Support.
- Collaborate with product/business users, data scientists and other engineers to define requirements to design, build and tune complex solutions.
- Involved in business requirement gathering, analysis and preparing design documents
- Involved in preparing SOLR collection and schema creation.
- Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
- Involved in debugging and fine tuning the SOLR cluster and queries
- Involved in importing documents data from external system to HDFS
- Developed Spark streaming applications to process real time events, ingest emails and instant messages into HBase and Elasticsearch.
- Managing and allocating tasks for onsite and offshore resources
- Involved in setting up Kerberos and authenticating from web application
- Involved in the refactoring the existing application to improve the performance of the application.
- Interacting with client to map the legacy data with SCOPE specific data.
- Developed Service Java Classes interface between application and external systems
- Have written SQL query for creating the batch table.
- Involved in Build Process and run the deployment procedure in the UNIX Environment on regular basis.
- Monitoring the log files on regular basis in UNIX environment.
Environment: Hortonworks Data Platform (HDP 2.3), Hadoop, HDFS, Spark, Kafka, Hive, SOLR 5.2.1, HBase, Sqoop, Solr, Sun Solaris, Elasticsearch 2.0.0, RSA, Primefaces, JSF, RAD 8/8.5, AngularJS, Websphere Application Server 8/8.5, Java 1.7, Subversion, EJB 3.0, Oracle 11g.
Hadoop Developer
Confidential - Arlington, VA
Responsibilities:
- Installed Hadoop, MapReduce, HDFS, and developed multiple MapReduce jobs in PIG and HIVE for data.
- Used IMPALA to read, write and query the Hadoop data in HDFS and configured KAFKA to read and write messages from external programs.
- Used PIG as ETL tool to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Created Stored Procedures to transform the Data and worked extensively in SQL for various needs of the transformations while loading the data.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
- Involved in converting Hive/SQL queries into Spark transformations using Spark data frames.
- Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
- Responsible for loading bulk amount of data in HBase using MapReduce by directly creating H-files and loading them.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
- Exporting of result set from HIVE to MySQL using Sqoop export tool for further processing.
Environment: Cloudera, Hadoop, HDFS, Hive, Impala,Spark Sql, Python, Sqoop, Oozie, Storm, Spark, Scala, MySQL, Shell Scripting
Hadoop Developer
Confidential, CA
Responsibilities:
- Create the project using HIVE, BIGSQL, PIG
- Involved in data modeling in Hadoop.
- Creating Hive tables and working on them using Hiveql.
- Written Apache PIG scripts to process the HDFS data.
- Involved in data modeling in Hadoop.
- Automated tasks using UNIX shell scripts.
Environment: HADOOP, HDFS, MAPREDUCE, HIVE, PIG, Scala, Python, HBASE, OOZIE, yarn, Spark, Core Java, Oracle, SQL, UBUNTU/UNIX, eclipse, Maven, JDBC drivers, Mainframe, MySQL, Linux, AWS, XML, CRM, SVN, PDSH, Putty, BigInsights