Senior Hadoop developer Resume Columbus, OH - Hire IT People

SUMMARY:

Around 9 years of experience in Hadoop /Big Data technologies such as in Hadoop , Pig, Hive, HBase, Oozie, Zookeeper, Sqoop, Storm, Flink, Flume, Zookeeper, Impala, Tez, Kafka and Spark with hands on experience in writing Map Reduce/YARN and Spark/Scala jobs.
Have good IT experience with special emphasis on Analysis, Design and Development and Testing of ETL methodologies in all the phases of the Data Warehousing.
Expertise in OLTP/OLAP System Study, Analysis and E - R modeling, developing Database Schemas like star schema and Snowflake schema used in relational, dimensional modeling.
Experience in optimizing and performance tuning of Mappings and implementing the complex business rules by creating re-usable Transformations, Mapplets and Tasks.
Worked on creation of the projections like Query specific projections, Pre- Join Projections, Live aggregate projections.
Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
Queried Vertica, SQL Server for data validation along with developing validation worksheets in Excel in order to validate the dashboards on Tableau.
Have used various versions of Hive on multiple projects. Apart from regular queries, I have also implemented UDFs and UDAFs. I worked on a project that involved migrating Hive tables and underlying data from Cloudera CDH to Hortonworks HDP.
Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
Extensively used SQL and PL/SQL for development of Procedures, Functions, Packages and Triggers.
Experienced on Tableau Desktop, Tableau Server and good understanding of tableau architecture
Experienced in integrating Kafka with Spark Streaming for high speed data processing.
Experience in ImplementingAWS solutions using EC2, S3 and Azure storage.
Experienced in developing business reports by writing complex SQL queries using views, macros, volatile and global temporary tables.
Working with AWS team in testing our Apache Spark- ETL application on EMR/EC2 using S3.
Experience in designing both time driven and data driven automated workflows using Oozie.
Experienced with work flow schedulers, data architecture including data ingestion pipeline design and data modelling.
Configuration of ElasticSearch on Amazon Web Service with static IP authentication security features
Experience in AWS Cloud platform and its features which includes EC2, AMI, EBS Cloudwatch, AWS Config, Auto-scaling, IAM user management, and AWS S3.
Managed AWS EC2 instances utilizing Auto Scaling, Elastic Load Balancing and Glacier for our QA and UAT environments as well as infrastructure servers for GI.
Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and Aggregation and how does it translate to MapReduce jobs.

TECHNICAL SKILLS:

Big Data Technologies: Hadoop, Spark, Kafka, Flume, HDFS, Hive, Impala, Map Reduce, Sqoop, Oozie.

Distribution: Cloudera, HortonWorks

Programming Languages: Python, Scala and Java

Web Technologies: HTML, J2EE, CSS, JavaScript, Servlets, JSP, XML, AWS, EC2, S3

Databases: DB2, MySQL, HBase, Cassandra

DB Languages: SQL, PL/SQL.

Operating Systems: Linux, UNIX, Windows IDE/Testing Tools Eclipse, IntelliJ, PyCharm

PROFESSIONAL EXPERIENCE:

Senior Hadoop developer

Confidential, Columbus, OH

Responsibilities :

Design data ingestion and integration process using SQOOP, Shell Scripts & Pig, with Hive.
Adding and Decommissioning Hadoop Cluster Nodes Including Balancing HDFS block data.
Implemented Fair schedulers on the Resource Manager to share the resources of the Cluster for the MRv2 jobs given by the users.
Worked with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
Perform investigation and migration from MRv1 to MRv2.
Developed Pyspark code to read data from Hive, group the fields and generate XML files.Enhanced the Pyspark code to write the generated XML files to a directory to zip them to CDAs
Worked with Big Data Analysts, Designers and Scientists in troubleshooting MRv1/MRv2 job failures and issues with Hive, Pig, Flume, and Apache Spark.
Utilized Apache Spark for Interactive Data Mining and Data Processing.
Accommodate load in its place before the data is analyzed using Apache Kafka with its fast, scalable, fault-tolerant system.
Aanalyzed the SQL scripts and designed the solution to implement using Pyspark.
Configuring Sqoop to import and export data from HDFS to RDBMS and vice-versa.
Handle the data exchange between HDFS & Web Applications and databases using Flume and Sqoop.
Used Hive and created Hive tables involved in data loading.
Extensively involved in querying using Hive, Pig.
Developed open source Impala/Hive Liqui base plug-in to schema migration in CI/CD pipelines.
Involved in writing custom UDF's for extending Pig core functionality.
Involved in writing custom MR jobs which utilize Java API.
Familiarity with NoSQL databases including Hbase and Cassandra.
Implemented Cassandra connection with the Resilient Distributed Datasets.
Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Setup automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
Setup automated processes to archive/clean the unwanted data on the cluster, on Name node and Standby node.
Created Gradle builds to build and deploy Spring Boot microservices to internal enterprise Docker registry.
Created Maven builds to build and deploy Spring Boot microservices to internal enterprise Docker registry.
Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
Documented the systems processes and procedures for future references.
Supported technical team members in management and review of Hadoop log files and data backups.
Designed target tables as per the requirement from the reporting team and designed Extraction, Transformation and Loading (ETL) using Talend.
Implemented File Transfer Protocol operations using Talend Studio to transfer files in between network folders.
Participated in development and execution of system and disaster recovery processes.
Experience with cloud AWS and service like EC2, ELB, RDS, Elasti Cache, Route53, EMR.
Hands on experience in cloud configuration for Amazon web services (AWS). Hands on experience with container technologies such as Docker, embed containers in existing CI/CD pipelines.
Set up independent testing lifecycle for CI/CD scripts with Vagrant and Virtual box.

Environment : Hadoop, MapReduce2, Hive, Pig, HDFS, Sqoop, Oozie, Microservices, Talend, Pyspark, CDH, Flume, Kafka, Spark, HBase, Zookeeper, Impala, LDAP, NoSQL, MySQL, Info bright, Linux, AWS, Ansible, Puppet, AWS, Chef.

Hadoop/Spark/Big Data Consultant

Confidential, Des Moines, IA

Responsibilities:

Performance optimizations on Spark/Scala.
Used Spark as ETL tool.
Implemented best offer logic using Pig scripts and Pig UDFs.
Analyzing of Large volumes of structured data using SparkSQL.
Responsible to load data from external systems, parse and clean data for data scientists
Create Docker images for Spark and Postgres
Worked on analyzing Hadoop cluster and different big data analytic tools including Hive, Spark, Python, Sqoop, flume, Oozie.
Avoided MapReduce by using PySpark for boosting performance to 3x times.
Used Spark SQL to access Hive tables for data analytical and fast processing
Import data from Postgres database to Hive using Sqoop using optimized techniques
Develop Cassandra application to ingest log and time series data.
Developed spark streaming application to process real time events
Research customer needs and develop applications as per the customers need
Developed microservices using Spring Boot api to interact with MongoDB to store analytical configurations.
Build Cassandra Cluster in AWS environment.
Developing Scripts and Batch Job to schedule various Hadoop Program
Fine tuning Cassandra and Spark clusters and Hive queries
Travel to customer places and identify current drilling issues.
Responsible for creating and maintaining the micro services, Postgres and Rabbit MQ services in the cloud environments (GE Predix, AWS and Azure)
Migrating the needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.
Developed Spark scripts by using Scala as per the requirement.
Involved in business requirement gathering, analysis and preparing design documents.
Developed MapReduce jobs to ingest data into Hbase and index into SOLR
Involved in preparing SOLR collection and schema creation.
Developed Spark jobs using Scala for processing locomotive events
Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
Involved in debugging and fine tuning the SOLR cluster and queries
Designed and developed applications using SOLRJ api to index and search documents
Involved in importing documents data from external system to HDFS
Developed Spark streaming applications to ingest emails and instant messages into HBase and Elasticsearch.
Involved in troubleshooting, performance issues and tuning Hadoop cluster.
Written code to interact with Hbase using Hbase java client API
Managing and allocating tasks for onsite and offshore resources
Involved in setting up Kerberos and authenticating from web application
Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.

Environment : HDP 2.6, AWS, Azure, Cassandra, Rabbit MQ, Postgres, SPARK, Hive, Elasticsearch, Hadoop, HDFS, Docker, Sqoop, MongoDB, Spring Boot, Swagger

Sr. Hadoop Developer

Confidential, Tampa, FL

Responsibilities:

Experience with complete SDLC process staging code reviews, source code management and build process
Implemented Big Data platforms as data storage, retrieval and processing systems
Developed data pipeline using Kafka, Sqoop, Hive and Java MapReduce to ingest customer behavioral data and financial histories into HDFS for analysis
Wrote Sqoop scripts for importing and exporting data into HDFS and Hive
Wrote MapReduce jobs to discover trends in data usage by the users
Load and transform large sets of structured, semi structured and unstructured data Pig
Experienced working on Pig to do transformations, event joins, filtering and some pre-aggregations before storing the data onto HDFS
Involved in developing Hive UDF's for the needed functionality that is not available out of the box from Hive
Created Sub-Queries for filtering and faster execution of data
Experienced in migrating Hive QL into Impala to minimize query response time
Used HCATALOG to access the Hive table metadata from MapReduce and Pig scripts
Experience loading and transforming large amounts of structured and unstructured data into HBase and exposure handling Automatic failover in HBase
Ran POC's in Spark to take the benchmarking of the implementation
Developed Spark jobs using Scala in test environment for faster data processing and querying
Worked on migrating MapReduce programs into Spark transformations using Spark and Scala
Used Python for pattern matching in build logs to format warnings and errors
Configured big data workflows to run on the top of Hadoop using Oozie and these workflows comprises of heterogeneous jobs like Pig, Hive, Sqoop Cluster co-ordination services through ZooKeeper
Hands on experience in Tableau for Data Visualization and analysis on large data sets, drawing various conclusions
Involved in developing test framework for data profiling and validation using interactive queries and collected all the test results into audit tables for comparing the results over the period
Documented all the requirements, code and implementation methodologies for reviewing and analyzation purposes
Extensively used GitHub as a code repository and Phabricator for managing day to day development process and to keep track of the issues

Environment : Java, Scala, Hadoop, Spark, HDFS, MapReduce, Yarn, Hive, Pig, Impala, Oozie, Sqoop, Flume, Kafka, Teradata, SQL, GitHub, Phabricator, Amazon Web Services

We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

Columbus, OH

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship