We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Columbus, OH

SUMMARY:

  • Around 9 years of experience in Hadoop /Big Data technologies such as in Hadoop , Pig, Hive, HBase, Oozie, Zookeeper, Sqoop, Storm, Flink, Flume, Zookeeper, Impala, Tez, Kafka and Spark with hands on experience in writing Map Reduce/YARN and Spark/Scala jobs.
  • Have good IT experience with special emphasis on Analysis, Design and Development and Testing of ETL methodologies in all the phases of the Data Warehousing.
  • Expertise in OLTP/OLAP System Study, Analysis and E - R modeling, developing Database Schemas like star schema and Snowflake schema used in relational, dimensional modeling.
  • Experience in optimizing and performance tuning of Mappings and implementing the complex business rules by creating re-usable Transformations, Mapplets and Tasks.
  • Worked on creation of the projections like Query specific projections, Pre- Join Projections, Live aggregate projections.
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Queried Vertica, SQL Server for data validation along with developing validation worksheets in Excel in order to validate the dashboards on Tableau.
  • Have used various versions of Hive on multiple projects. Apart from regular queries, I have also implemented UDFs and UDAFs. I worked on a project that involved migrating Hive tables and underlying data from Cloudera CDH to Hortonworks HDP.
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Extensively used SQL and PL/SQL for development of Procedures, Functions, Packages and Triggers.
  • Experienced on Tableau Desktop, Tableau Server and good understanding of tableau architecture
  • Experienced in integrating Kafka with Spark Streaming for high speed data processing.
  • Experience in ImplementingAWS solutions using EC2, S3 and Azure storage.
  • Experienced in developing business reports by writing complex SQL queries using views, macros, volatile and global temporary tables.
  • Working with AWS team in testing our Apache Spark- ETL application on EMR/EC2 using S3.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Experienced with work flow schedulers, data architecture including data ingestion pipeline design and data modelling.
  • Configuration of ElasticSearch on Amazon Web Service with static IP authentication security features
  • Experience in AWS Cloud platform and its features which includes EC2, AMI, EBS Cloudwatch, AWS Config, Auto-scaling, IAM user management, and AWS S3.
  • Managed AWS EC2 instances utilizing Auto Scaling, Elastic Load Balancing and Glacier for our QA and UAT environments as well as infrastructure servers for GI.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and Aggregation and how does it translate to MapReduce jobs.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop, Spark, Kafka, Flume, HDFS, Hive, Impala, Map Reduce, Sqoop, Oozie.

Distribution: Cloudera, HortonWorks

Programming Languages: Python, Scala and Java

Web Technologies: HTML, J2EE, CSS, JavaScript, Servlets, JSP, XML, AWS, EC2, S3

Databases: DB2, MySQL, HBase, Cassandra

DB Languages: SQL, PL/SQL.

Operating Systems: Linux, UNIX, Windows IDE/Testing Tools Eclipse, IntelliJ, PyCharm

PROFESSIONAL EXPERIENCE:

Senior Hadoop developer

Confidential, Columbus, OH

Responsibilities :

  • Design data ingestion and integration process using SQOOP, Shell Scripts & Pig, with Hive.
  • Adding and Decommissioning Hadoop Cluster Nodes Including Balancing HDFS block data.
  • Implemented Fair schedulers on the Resource Manager to share the resources of the Cluster for the MRv2 jobs given by the users.
  • Worked with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
  • Perform investigation and migration from MRv1 to MRv2.
  • Developed Pyspark code to read data from Hive, group the fields and generate XML files.Enhanced the Pyspark code to write the generated XML files to a directory to zip them to CDAs
  • Worked with Big Data Analysts, Designers and Scientists in troubleshooting MRv1/MRv2 job failures and issues with Hive, Pig, Flume, and Apache Spark.
  • Utilized Apache Spark for Interactive Data Mining and Data Processing.
  • Accommodate load in its place before the data is analyzed using Apache Kafka with its fast, scalable, fault-tolerant system.
  • Aanalyzed the SQL scripts and designed the solution to implement using Pyspark.
  • Configuring Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Handle the data exchange between HDFS & Web Applications and databases using Flume and Sqoop.
  • Used Hive and created Hive tables involved in data loading.
  • Extensively involved in querying using Hive, Pig.
  • Developed open source Impala/Hive Liqui base plug-in to schema migration in CI/CD pipelines.
  • Involved in writing custom UDF's for extending Pig core functionality.
  • Involved in writing custom MR jobs which utilize Java API.
  • Familiarity with NoSQL databases including Hbase and Cassandra.
  • Implemented Cassandra connection with the Resilient Distributed Datasets.
  • Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Setup automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
  • Setup automated processes to archive/clean the unwanted data on the cluster, on Name node and Standby node.
  • Created Gradle builds to build and deploy Spring Boot microservices to internal enterprise Docker registry.
  • Created Maven builds to build and deploy Spring Boot microservices to internal enterprise Docker registry.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Documented the systems processes and procedures for future references.
  • Supported technical team members in management and review of Hadoop log files and data backups.
  • Designed target tables as per the requirement from the reporting team and designed Extraction, Transformation and Loading (ETL) using Talend.
  • Implemented File Transfer Protocol operations using Talend Studio to transfer files in between network folders.
  • Participated in development and execution of system and disaster recovery processes.
  • Experience with cloud AWS and service like EC2, ELB, RDS, Elasti Cache, Route53, EMR.
  • Hands on experience in cloud configuration for Amazon web services (AWS). Hands on experience with container technologies such as Docker, embed containers in existing CI/CD pipelines.
  • Set up independent testing lifecycle for CI/CD scripts with Vagrant and Virtual box.

Environment : Hadoop, MapReduce2, Hive, Pig, HDFS, Sqoop, Oozie, Microservices, Talend, Pyspark, CDH, Flume, Kafka, Spark, HBase, Zookeeper, Impala, LDAP, NoSQL, MySQL, Info bright, Linux, AWS, Ansible, Puppet, AWS, Chef.

Hadoop/Spark/Big Data Consultant

Confidential, Des Moines, IA

Responsibilities:

  • Performance optimizations on Spark/Scala.
  • Used Spark as ETL tool.
  • Implemented best offer logic using Pig scripts and Pig UDFs.
  • Analyzing of Large volumes of structured data using SparkSQL.
  • Responsible to load data from external systems, parse and clean data for data scientists
  • Create Docker images for Spark and Postgres
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Hive, Spark, Python, Sqoop, flume, Oozie.
  • Avoided MapReduce by using PySpark for boosting performance to 3x times.
  • Used Spark SQL to access Hive tables for data analytical and fast processing
  • Import data from Postgres database to Hive using Sqoop using optimized techniques
  • Develop Cassandra application to ingest log and time series data.
  • Developed spark streaming application to process real time events
  • Research customer needs and develop applications as per the customers need
  • Developed microservices using Spring Boot api to interact with MongoDB to store analytical configurations.
  • Build Cassandra Cluster in AWS environment.
  • Developing Scripts and Batch Job to schedule various Hadoop Program
  • Fine tuning Cassandra and Spark clusters and Hive queries
  • Travel to customer places and identify current drilling issues.
  • Responsible for creating and maintaining the micro services, Postgres and Rabbit MQ services in the cloud environments (GE Predix, AWS and Azure)
  • Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
  • Developed Spark scripts by using Scala as per the requirement.
  • Involved in business requirement gathering, analysis and preparing design documents.
  • Developed MapReduce jobs to ingest data into Hbase and index into SOLR
  • Involved in preparing SOLR collection and schema creation.
  • Developed Spark jobs using Scala for processing locomotive events
  • Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
  • Involved in debugging and fine tuning the SOLR cluster and queries
  • Designed and developed applications using SOLRJ api to index and search documents
  • Involved in importing documents data from external system to HDFS
  • Developed Spark streaming applications to ingest emails and instant messages into HBase and Elasticsearch.
  • Involved in troubleshooting, performance issues and tuning Hadoop cluster.
  • Written code to interact with Hbase using Hbase java client API
  • Managing and allocating tasks for onsite and offshore resources
  • Involved in setting up Kerberos and authenticating from web application
  • Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.

Environment : HDP 2.6, AWS, Azure, Cassandra, Rabbit MQ, Postgres, SPARK, Hive, Elasticsearch, Hadoop, HDFS, Docker, Sqoop, MongoDB, Spring Boot, Swagger

Sr. Hadoop Developer

Confidential, Tampa, FL

Responsibilities:

  • Experience with complete SDLC process staging code reviews, source code management and build process
  • Implemented Big Data platforms as data storage, retrieval and processing systems
  • Developed data pipeline using Kafka, Sqoop, Hive and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis
  • Wrote Sqoop scripts for importing and exporting data into HDFS and Hive
  • Wrote MapReduce jobs to discover trends in data usage by the users
  • Load and transform large sets of structured, semi structured and unstructured data Pig
  • Experienced working on Pig to do transformations, event joins, filtering and some pre-aggregations before storing the data onto HDFS
  • Involved in developing Hive UDF's for the needed functionality that is not available out of the box from Hive
  • Created Sub-Queries for filtering and faster execution of data
  • Experienced in migrating Hive QL into Impala to minimize query response time
  • Used HCATALOG to access the Hive table metadata from MapReduce and Pig scripts
  • Experience loading and transforming large amounts of structured and unstructured data into HBase and exposure handling Automatic failover in HBase
  • Ran POC's in Spark to take the benchmarking of the implementation
  • Developed Spark jobs using Scala in test environment for faster data processing and querying
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala
  • Used Python for pattern matching in build logs to format warnings and errors
  • Configured big data workflows to run on the top of Hadoop using Oozie and these workflows comprises of heterogeneous jobs like Pig, Hive, Sqoop Cluster co-ordination services through ZooKeeper
  • Hands on experience in Tableau for Data Visualization and analysis on large data sets, drawing various conclusions
  • Involved in developing test framework for data profiling and validation using interactive queries and collected all the test results into audit tables for comparing the results over the period
  • Documented all the requirements, code and implementation methodologies for reviewing and analyzation purposes
  • Extensively used GitHub as a code repository and Phabricator for managing day to day development process and to keep track of the issues

Environment : Java, Scala, Hadoop, Spark, HDFS, MapReduce, Yarn, Hive, Pig, Impala, Oozie, Sqoop, Flume, Kafka, Teradata, SQL, GitHub, Phabricator, Amazon Web Services

We'd love your feedback!