We provide IT Staff Augmentation Services!

Senior Hadoop Developer Resume

2.00/5 (Submit Your Rating)

Columbus, OhiO

PROFESSIONAL SUMMARY:

  • Around 9 years of experience in Hadoop/Big Data technologies such as in Hadoop, Pig, Hive, HBase, Oozie, Zookeeper, Sqoop, Storm, Flink, Flume, Zookeeper, Impala, Tez, Kafka and Spark with hands on experience in writing Map Reduce/YARN and Spark/Scala jobs.
  • Have good IT experience with special emphasis on Analysis, Design and Development and Testing of ETL methodologies in all the phases of the Data Warehousing.
  • Expertise in OLTP/OLAP System Study, Analysis and E - R modeling, developing Database Schemas like star schema and Snowflake schema used in relational, dimensional modeling.
  • Experience in optimizing and performance tuning of Mappings and implementing the complex business rules by creating re-usable Transformations, Mapplets and Tasks .
  • Worked on creation of the projections like Query specific projections, Pre- Join Projections, Live aggregate projections .
  • Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
  • Queried Vertica , SQL Server for data validation along with developing validation worksheets in Excel in order to validate the dashboards on Tableau .
  • Have used various versions of Hive on multiple projects. Apart from regular queries, I have also implemented UDFs and UDAFs. I worked on a project that involved migrating Hive tables and underlying data from Cloudera CDH to Hortonworks HDP .
  • Developed Spark code using Scala and Spark-SQL for faster testing and data processing.
  • Extensively used SQL and PL/SQL for development of Procedures, Functions, Packages and Triggers.
  • Experienced on Tableau Desktop, Tableau Server and good understanding of tableau architecture
  • Experienced in integrating Kafka with Spark Streaming for high speed data processing.
  • Experience in Implementing AWS solutions using EC2, S3 and Azure storage.
  • Experienced in developing business reports by writing complex SQL queries using views, macros, volatile and global temporary tables.
  • Working with AWS team in testing our Apache Spark- ETL application on EMR/EC2 using S3.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Experienced with work flow schedulers , data architecture including data ingestion pipeline design and data modelling.
  • Configuration of ElasticSearch on Amazon Web Service with static IP authentication security features
  • Experience in AWS Cloud platform and its features which includes EC2, AMI, EBS Cloudwatch, AWS Config, Auto-scaling, IAM user management, and AWS S3.
  • Managed AWS EC2 instances utilizing Auto Scaling, Elastic Load Balancing and Glacier for our QA and UAT environments as well as infrastructure servers for GI.
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and Aggregation and how does it translate to MapReduce jobs.

TECHNICAL SKILLS:

Specialties: Data warehousing/ETL/BI Concepts, Data Architecture, Software Development methodologies, Data Modeling

Business Tools: Tableau 10.X Business Objects XI R2, InformaticaPowercenter 8.x, OLAP/OLTP, Talend, Teradata 13.x, Teradata SQL Assistant

Big Data: Hadoop, map reduce 1.0/2.0, pig, hive, hbase, sqoop, oozie, zookeeper, kafka, spark, flume, storm, impala, mahout, hue, tez, hcatalog, storm, Cassandra.

Cloud Technologies: AWS-EMR, AWS S3, Glue Data Catalog, Kinesis, Lambda, ELK(Elastic, Logstash, Kibana) Stack, cloudwatch metric, Azure

Databases: DB2, MySQL, MS SQL server, Vertica, Mongo DB, Oracle, SQL 2008, Hortonworks, Cloudera

Languages: Python, Java / J2EE, Scala, HTML, SQL, JDBC, JavaScript, PHP

Operating System: Mac OS, Unix, Linux (Various Versions), Windows 2003/7/8/8.1/XP

Web Development: HTML, Java Script, XML, PHP, JSP, Servlets, JavaScript

Application Server: Apache Tomcat, WebLogic, WebSphere Tools Eclipse, NetBeans

PROFESSIONAL EXPERIENCE:

Confidential, Columbus, Ohio

Senior Hadoop developer

Responsibilities:

  • Design data ingestion and integration process using SQOOP, Shell Scripts & Pig, with Hive.
  • Adding and Decommissioning Hadoop Cluster Nodes Including Balancing HDFS block data.
  • Implemented Fair schedulers on the Resource Manager to share the resources of the Cluster for the MRv2 jobs given by the users.
  • Worked with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
  • Perform investigation and migration from MRv1 to MRv2.
  • Developed Pyspark code to read data from Hive, group the fields and generate XML files.Enhanced the Pyspark code to write the generated XML files to a directory to zip them to CDAs
  • Worked with Big Data Analysts, Designers and Scientists in troubleshooting MRv1/MRv2 job failures and issues with Hive, Pig, Flume, and Apache Spark.
  • Utilized Apache Spark for Interactive Data Mining and Data Processing.
  • Accommodate load in its place before the data is analyzed using Apache Kafka with its fast, scalable, fault-tolerant system.
  • Aanalyzed the SQL scripts and designed the solution to implement using Pyspark.
  • Configuring Sqoop to import and export data from HDFS to RDBMS and vice-versa.
  • Handle the data exchange between HDFS & Web Applications and databases using Flume and Sqoop.
  • Used Hive and created Hive tables involved in data loading.
  • Extensively involved in querying using Hive, Pig.
  • Developed open source Impala/Hive Liqui base plug-in to schema migration in CI/CD pipelines.
  • Involved in writing custom UDF's for extending Pig core functionality.
  • Involved in writing custom MR jobs which utilize Java API.
  • Familiarity with NoSQL databases including Hbase and Cassandra.
  • Implemented Cassandra connection with the Resilient Distributed Datasets.
  • Design and develop JAVA API (Commerce API) which provides functionality to connect to the Cassandra through Java services.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Setup automated processes to analyze the System and Hadoop log files for predefined errors and send alerts to appropriate groups.
  • Setup automated processes to archive/clean the unwanted data on the cluster, on Name node and Standby node.
  • Created Gradle builds to build and deploy Spring Boot microservices to internal enterprise Docker registry.
  • Created Maven builds to build and deploy Spring Boot microservices to internal enterprise Docker registry.
  • Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
  • Documented the systems processes and procedures for future references.
  • Supported technical team members in management and review of Hadoop log files and data backups.
  • Designed target tables as per the requirement from the reporting team and designed Extraction, Transformation and Loading (ETL) using Talend.
  • Implemented File Transfer Protocol operations using Talend Studio to transfer files in between network folders.
  • Participated in development and execution of system and disaster recovery processes.
  • Experience with cloud AWS and service like EC2, ELB, RDS, Elasti Cache, Route53, EMR.
  • Hands on experience in cloud configuration for Amazon web services (AWS).
  • Hands on experience with container technologies such as Docker, embed containers in existing CI/CD pipelines.
  • Set up independent testing lifecycle for CI/CD scripts with Vagrant and Virtual box.

Environment: Hadoop, MapReduce2, Hive, Pig, HDFS, Sqoop, Oozie, Microservices, Talend, Pyspark, CDH, Flume, Kafka, Spark, HBase, Zookeeper, Impala, LDAP, NoSQL, MySQL, Info bright, Linux, AWS, Ansible, Puppet, AWS, Chef.

Confidential, Atlanta, GA

Senior Hadoop/Spark Developer

Responsibilities: -

  • Worked on different tools for Presto to process these large datasets.
  • Worked on Core tables of Revenue DataFeed(RDF) that calculates the revenue of the advertisers of the Facebook.
  • Involved into testing and migration to Presto.
  • Worked extensively with Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Experienced with the tools in Hadoop Ecosystem including Pig, Hive, HDFS, Sqoop, Spark, Yarn and Oozie.
  • Involved in importing the real time data to Hadoop using kafka and implemented the Oozie job for daily.
  • Experienced in writing complex SQL Queries, Stored Procedures, Triggers, Views, Cursors, Joins, Constraints, DDL, DML and User Defined Functions to implement the business logic.
  • Developed Custom ETL Solution, Batch processing and Real-Time data ingestion pipeline to move data in and out of Hadoop using Python and shell Script.
  • Experience in Large Data processing and transformation using Hadoop-Hive and Sqoop.
  • Real time predictive analytics capabilities using Spark Streaming, Spark SQL and Oracle Data Mining tools.
  • Experience with Tableau for Data Acquisition and visualizations.
  • Working with AWS team in testing our Apache Spark- ETL application on EMR/EC2 using S3.
  • Assisted in data analysis, star schema data modeling and design specific to data warehousing and business intelligence environment.
  • Expertise in platform related Hadoop Production support tasks by analyzing the job logs.
  • Monitored System health and logs and responded accordingly to any warning or failure conditions.

Environment: Amazon Web Service, Vertica, Informatica PowerCenter, Spark, AWS, Kafka, AWS-S3, Apache-Hadoop, Hive, Pig, Shell Script, ETL, tableau, Agile Methodology.

Confidential, Texas

Hadoop Developer

Responsibilities:

  • Worked with variables and parameter files and designed ETL framework to create parameter files to make it dynamic.
  • Currently working on the Teradata to HP Vertica Data Migration Project Working extensively on the Copy Command for extracting the data from the files to Vertica . Monitor the ETL process job and validate the data loaded in Vertica DW .
  • Built a Full-Service Catalog System which has a full workflow using ElasticSearch, Logstash, Kibana, Kinesis, CloudWatch.
  • Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive.
  • Experienced in transferring data from different data sources into HDFS systems using kafka producers, consumers and kafka brokers
  • The logs and semi structured content that are stored on HDFS were preprocessed using PIG and the processed data is imported into Hivewarehouse which enabled business analysts to write Hive queries.
  • Worked with data migration form Hadoop clusters to cloud. Good knowledge of cloud components like AWS S3, EMR, Elastic Cache and EC2.
  • Responsible to write HiveandPig scripts as ETL tool to do transformations, event joins, filter both traffic and some pre-aggregations before storing into the HDFS. Developed the Vertica UDF's to preprocess the data for analysis.
  • Designed the reporting application that uses the Spark SQL to fetch and generate reports on HBase.
  • Build custom batch aggression framework for creating reporting aggregates in Hadoop.
  • Experience in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the Hive queries. Built real time pipeline for streaming data using Kafka and SparkStreaming .
  • Experienced with NoSQL databases like HBase , MongoDB and Cassandra and wrote Storm topology to accept the events from Kafka producer and emit into Cassandra DB .
  • Experienced in working with spark ecosystem using spark SQL and scala queries on different formats like Text file, CSV file.
  • Great hands on experience with Pyspark for using Spark libraries by using python scripting for data analysis.
  • Wrote PythonScript to access databases and execute scripts and commands.
  • Involved in converting Cassandra/Hive/SQL queries into Spark transformations using Spark RDD's in Scala and Python.
  • Created ODBC connection through Sqoop between Hortonworks and SQL Server
  • Building, publishing customized interactive reports and dashboards, report scheduling using Tableau server. Creating New Schedule's and checking the task's daily on the server.

Environment: Hadoop, Hive, Apache Spark, Apache Kafka, Hortonworks, AWS,ElasticSearch, Lambda, Apache Cassandra, Hbase, MongoDB SQL, Sqoop, Flume, Oozie, Java (jdk 1.6), Eclipse, InformaticaPower Center 9.1, Tableau, Teradata 13.x, Teradata SQL Assistant.

Confidential

Java Developer

Responsibilities: -

  • Involved in developing, testing and implementation of the system using Struts, JSF, and Hibernate.
  • Developing, modifying, fixing, reviewing, testing and migrating the Java, JSP, XML, Servlet, SQLs, JSF.
  • Updated user-interactive web pages from JSP and CSSto Html5, CSS, and JavaScript for the best user experience. Developed Servlets, Session and Entity Beans handling business logic and data.
  • Created enterprise deployment strategy and designed the enterprise deployment process to deploy Web Services, J2EE programs on more than 7 different SOA/WebLogic instances across development, test and production environments.
  • Designed user interface HTML, Swing, CSS, XML, Java Script and JSP.
  • Implemented the presentation using a combination of Java Server Pages (JSP) to render the HTML and well-defined API interface to allow access to the application services layer.
  • Used Enterprise Java Beans (EJBs) extensively in the application Developed and deployed Session Beans to perform user authentication.
  • Involve in Requirement Analysis, Design, Code Testing and debugging, Implementation activities.
  • Involved in the Performance Tuning of Database and Informatica. Improved performance by identifying and rectifying the performance bottle necks.
  • Understanding how to apply technologies to solve big data problems and to develop innovative big data solutions
  • Designed and developed Job flows using Oozie.
  • Developed Sqoop commands to pull the data from Teradata.
  • The data is collected from distributed sources into Avro models. Applied transformations and standardizations and loaded into HBase for further data processing.
  • Wrote PL/SQL Packages and Stored procedures to implement business rules and validations.

Environment: Java, J2EE, Java Server Pages (JSP), JavaScript,Hadoop, Oozie, Hive, Teradata, Servlets, JDBC, PL/SQL, ODBC, Struts Framework, XML, CSS, HTML, DHTML, XSL, XSLT and MySQL.

We'd love your feedback!