We provide IT Staff Augmentation Services!

Sr. Hadoop Developer/data Engineer/azure Developer Resume

Milwaukee, WI

­­PROFESSIONAL SUMMARY:

  • Around 8 years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Java/J2EE technologies and Big Data Analytics.
  • Substantial experience in Spark and MapReduce jobs in Java and Scala. Having experience in PIG, Flume, Sqoop, Zookeeper, Kafka, HBase, Phoenix and Hive and Spark.
  • Hands on experience in installing, configuring and using ecosystem components like Hadoop, Spark, HBase, Zoo Keeper, Pig, Hive, Hortonworks, Cassandra, Sqoop, PIG, Flume.
  • Extensive Knowledge on automation DevOps tools such as Puppet and Chef.
  • Experience in web - based languages such as HTML, CSS, PHP, XML and other web methodologies including Web Services and SOAP.
  • Good Experience in importing and exporting data between HDFS and Relational Database Management systems using Sqoop.
  • Extensive experience in NoSQL databases such as HBase, Cassandra, and phoenix
  • Worked on Multi Clustered environment and setting up Cloudera Hadoop, Hortonworks, and EMR, Azure distributions.
  • Background with traditional databases such as Oracle, Teradata, Netezza, SQL Server, ETL tools/processes and Data warehousing architectures.
  • Proficient in working with Spark programs using Scala and Python.
  • Experience in transferring Streaming data from different data sources into HDFS and HBase using Apache Flume, Kafka and Spark streaming.
  • Experienced in using Zookeeper and OOZIE Operational Services for coordinating the cluster and scheduling workflows.
  • Extensive experience in Java and J2EE technologies like Servlets, JSP, Enterprise Java Beans (EJB), JDBC.
  • Experienced in importing of data from various data sources, performed transformations using Hive, Map Reduce, loaded data into HDFS and extracted the data from relational databases like Oracle, MySQL, MSSQL, Teradata into HDFS and Hive using Sqoop.
  • Expertise in writing Map-Reduce Jobs in Java for processing large sets of structured semi-structured and unstructured data sets and stores them in HDFS.
  • Experienced in Application Development using Java, Scala, Hadoop, RDBMS and Linux shell scripting and performance tuning.
  • Experienced in loading data to hive partitions and creating buckets in Hive.
  • Experienced in relational databases like MySQL, Oracle and NoSQL databases like HBase and Cassandra.
  • Hands-on experience in Developing Hadoop cluster on Public and Private Cloud Environment like Amazon AWS, Azure and OpenStack.

TECHNICAL SKILLS:

Java/J2EE Technologies: JSP, Servlets, JQuery, JDBC, Java Script

Hadoop/Big Data: Hadoop, Hive, Pig, HBase, Map Reduce, Zookeeper, Sqoop, Oozie, Flume, Storm

Programming Languages: Java, J2EE, HQL, R, Python, XPath, PL/SQL, Pig Latin.

Spark Ecosystems: Spark SQL, Spark Streaming, Kafka, Phoenix, Cassandra, Alluxio, Flink

Web Technologies: HTML, XML, DHTML, XHTML, CSS, XSLT.

Web/Application servers: Apache HTTP server, Apache Tomcat, AJBoss.

Databases: Microsoft Access, Mongo DB, Cassandra, MS SQL, Oracle.

PROFESSIONAL EXPERIENCE:

Confidential, Milwaukee, WI

Sr. Hadoop Developer/Data Engineer/Azure Developer

Responsibilities:

  • Involved in Architecture and System Design and development process.
  • Worked with off-site (USA based) resources for successful implementation of the Workflow module.
  • Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
  • Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB)
  • Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools
  • Hands on experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS using Sqoop.
  • Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Flume, MapReduce, and Hive.
  • Involved in NOSQL databases like HBase, Apache Cassandra in implementing and integration.
  • Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.
  • Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
  • Experience in managing and reviewing Hadoop Log files.
  • Used Zookeeper to provide coordination services to the cluster.
  • Used Microsoft Azure for building the applications and for building, testing, deploying the applications.
  • Experienced using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Experience and understanding in Spark and Storm.

Confidential, San Diego, CA

Sr. Hadoop Developer

Responsibilities:

  • Launching and Setup of Hadoop Cluster, which includes configuring different components of Hadoop.
  • Hands on experience in loading data from UNIX file system to HDFS.
  • Wrote the Map Reduce jobs to parse the web logs, which are stored in HDFS.
  • Managing the Hadoop distribution with Cloudera Manager, Cloudera Navigator, and Hue.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.
  • Developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre- processing.
  • Cluster coordination services through Zookeeper.
  • Designed and implemented Hive queries and functions for evaluation, filtering, loading and storing of data.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs, which run independently with time and data availability.
  • Utilized Apache Hadoop environment by Cloudera.
  • Expertise in Partitions, Bucketing concepts in Hive and analyzed the data using the HiveQL
  • Installed and configured Flume, Hive, PIG, Sqoop and Oozie on the Hadoop cluster.
  • Involved in creating Hive tables, loading data and running hive queries in those data.
  • Extensive working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
  • Loading the data to HBase Using Pig, Hive and Java API's.
  • Worked with NoSQL databases like HBase in creating HBase tables to load large sets of semi-structured.
  • Experienced with performing CURD operations in HBase.
  • Collected and aggregated large amounts of web log data from different sources such as web servers, mobile using Apache Flume and stored the data into HDFS/HBase for analysis.
  • Developed Flume ETL job for handling data from HTTP Source and Sink as HDFS.
  • Involved in writing optimized PIG Script along with involved in developing and testing PIG Latin Scripts.
  • Created Map Reduce programs for some refined queries on big data.
  • Working knowledge in writing PIG's Load and Store functions.
  • Understanding the requirements for the project.
  • Get data from multiple oracle tables and analyze the data using spark store in HBase phoenix.
  • Send devices data to co-relation layer to know why this problem occur? How to resolve this problem using spark and Kafka
  • Raise a request ticket if problem not resolved automatically.
  • If resolved update the status in Phoenix database.
  • Everything automate this process use Jenkins.
  • Create test cases, performance testing, monitoring and more.
  • Performing unit testing and preparing UTC document.
  • Preparing Release Notes, Deployment docs.
  • Modify the code and analyze spark web UI to optimize query performance.

Technology & Tools: SparkSQL, Streaming, Kafka, graph Frames, phoenix

Environment: Apache Hadoop 1.0.1, MapReduce, Cloudera, HDFS, CentOS, Zookeeper, Sqoop, Cassandra, Hive, PIG, Oozie, Java, Eclipse, Amazon EC2, JSP, Servlets.

Confidential, Atlanta, GA

Sr. Hadoop Developer

Responsibilities:

  • Hands on experience in developing and deploying enterprise-based applications using major components in Hadoop ecosystem like Hadoop2, YARN, Hive, Pig, Map Reduce, HBase, Flume, Scoop, Spark, Strom, Kafka, Oozie and Zookeeper.
  • Experience in installation, configuration, supporting and managing Hadoop Clusters using Hortonworks and Cloudera (CDH3, CDH4) distributions on Amazon web services (AWS).
  • Excellent Programming skills at a higher level of abstraction using Scala and Spark.
  • Good understanding in processing of real-time data using Spark.
  • Hands on experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS using Sqoop.
  • Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Flume, MapReduce, and Hive.
  • Involved in NOSQL databases like HBase, Apache Cassandra in implementing and integration.
  • Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.
  • Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
  • Experience in managing and reviewing Hadoop Log files.
  • Used Zookeeper to provide coordination services to the cluster.
  • Used Microsoft Azure for building the applications and for building, testing, deploying the applications.
  • Experienced using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Experience and understanding in Spark and Storm.
  • Hands on dealing with log files to extract data and to copy into HDFS using flume.
  • Experience in analyzing data using Hive, Pig Latin, and custom MR programs in Java.
  • Hands on experience in Analysis, Design, Coding and testing phases of Software Development Life Cycle (SDLC).
  • Experience in multiple database and tools, SQL analytical functions, Oracle PL/SQL server and DB2.
  • Experience in Creating ETL jobs both design and code to process data to target databases.
  • Worked on different file formats like Avro, Parquet, RC file format, JSON format.
  • Involved in writing Python scripts for building disaster recovery process for current processing data into data center by providing current static location.
  • Hands on experience working on NoSQL databases like MongoDB, HBase, Cassandra and its integration with Hadoop cluster.
  • Developed web application in open source java framework Spring. Utilized Spring MVC framework.
  • Experienced front-end development using EXT-JS, jQuery, JavaScript, HTML, Ajax and CSS.
  • Have good interpersonal, communicational skills, strong problem-solving skills, explore and adapt to new technologies with ease and a good team member.
  • Understanding the requirements for the project.
  • Extracting data from source, transforming and loading into Hive.
  • Convert java objects to csv files to do further processing.
  • Debug the code & Unit testing
  • Schedule shall script using Control M
  • Query performance optimization
  • Performance Analysis using spark web UI.
  • Write alternative code to optimize Spark performance.
  • Validating and testing the code.

Technology & Tools: Spark 2.1, Cloudera, Control M, Hive, Java

Environment: Hadoop, YARN, HBase, Azure, SDLC, Cloudera, MVC, NoSQL, Kafka, Python, Zookeeper, Oozie, jQuery, JavaScript, HTML, Ajax and CSS.

Confidential, Omaha, NE

Sr.Spark/Hadoop Developer

Responsibilities:

  • Extensively migrated existing architecture to Spark Streaming to process the live streaming data.
  • Responsible for Spark Core configuration based on type of Input Source.
  • Executed Spark code using Scala for Spark Streaming/SQL for faster processing of data.
  • Performed SQL Joins among Hive tables to get input for Spark batch process.
  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Developed Python code to gather the data from HBase and designs the solution to implement using PySpark.
  • Exported the patterns analyzed back into Teradata using Sqoop. Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Developed PySpark code to mimic the transformations performed in the on premise environment.
  • Analyzed the SQL scripts and designed solutions to implement using PySpark.
  • Created custom new columns depending up on the use case while ingesting the data into Hadoop using PySpark.
  • Developed environmental search engine using JAVA , Apache SOLR and MYSQL.
  • Analyze Cassandra database and compare it with other open-source NoSQL databases to find which one of them better suites the current requirement.
  • Integrated Cassandra as a distributed persistent metadata store to provide metadata resolution for network entities on the network.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Designed multiple Python packages that were used within a large ETL process used to load 2TB of data from an existing Oracle database into a new PostgreSQL cluster
  • Involved in converting Hive/Sql queries into Spark transformations using Spark RDD’s.
  • Loading data from Linux file system to HDFS and vice-versa
  • Developed UDF is using both Data Frames/SQL and RDD in Spark for data Aggregation queries and reverting back into OLTP through Sqoop.
  • Knowledge of ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
  • Implementing advanced procedures like text analytics and processing using the in- memory computing capabilities like Apache Spark written in Scala.
  • Installed and monitored Hadoop ecosystems tools on multiple operating systems like Ubuntu, CentOS.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Participated in development/implementation of Cloudera impala Hadoop environment.
  • Collect the data using Spark Streaming and dump into Cassandra Cluster
  • Developed Scala scripts using both Data frames/SQL/Datasets and RDD/MapReduce in Spark for Data aggregation, queries and writing data back into OLTP system through Sqoop.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Wrote Java code to format XML documents; upload them to Solr server for indexing.
  • Used AWS to export MapReduce jobs into Spark RDD transformations.
  • Writing AWS Terraform templates for any automation requirements in AWS services.
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Deploy and configured cloud AWS EC2 for client websites moving from self-hosted services for scalability purposes.
  • Work with multiple teams to provision AWS infrastructure for development and production environments.
  • Experience in designing Kafka for multi datacenter cluster and monitoring it.
  • Designed number of partitions and replication factor for Kafka topics based on business requirements.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python (PySpark).
  • Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
  • Experience on Kafka and Spark integration for real time data processing.
  • Developed Kafka producer and consumer components for real time data processing.
  • Hands-on experience for setting up Kafka mirror maker for data replication across the clusters.
  • Experience in Configure, Design, Implement and monitor Kafka Cluster and connectors.
  • Oracle SQL tuning using explain plan.
  • Manipulate, serialize, model data in multiple forms like JSON, XML.
  • Involved in setting up map reduce1 and map reduce2.
  • Prepared Avro schema files for generating Hive tables.
  • Used Impala connectivity from the User Interface (UI) and query the results using ImpalaQL.
  • Worked on physical transformations of data model, which involved in creating Tables, Indexes, Joins, Views and Partitions.
  • Involved in Analysis, Design, System architectural design, Process interfaces design, design, documentation.
  • Used Jira for bug tracking and Bit Bucket to check-in and checkout code changes.
  • Involved in Cassandra Data modelling to create key spaces and tables in multi Data Center DSE Cassandra DB.
  • Utilized Agile and Scrum Methodology to help manage and organize a team of developers with regular code review sessions.

Confidential, Edwardsville, IL

Hadoop Developer

Responsibilities:

  • Developed simple to complex MapReduce jobs using Java language for processing and validating the data.
  • Developed data pipeline using Sqoop, Spark, MapReduce, and Hive to ingest, transform and analyze, customer behavioral data.
  • Exported analyzed data to relational databases using Sqoop for visualization to generate reports for the BI team.
  • Implemented Spark using python and Spark SQL for faster processing of data and algorithms for real time analysis in Spark.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Used the Spark - Cassandra Connector to load data to and from Cassandra. Real time streaming the data using Spark with Kafka.
  • Developing Kafka producers and consumers in java, integrating with apache storm, and ingesting data into HDFS and HBase by implementing the rules in storm.
  • Built a prototype for real time analysis using Spark streaming and Kafka.
  • Built a prototype for real time analysis using Spark streaming and Kafka. Built a prototype for real time analysis using Spark streaming and Kafka.
  • Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
  • Involved in creating Hive tables and working on them using HiveQL and perform data analysis using Hive and Pig.
  • Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Expertise in extending Hive and Pig core functionalities by writing custom User Defined Functions (UDF).
  • Used IMPALA to pull the data from Hive tables.
  • Worked on Apache Flume for collecting and aggregating huge amount of log data and stored it on HDFS for doing further analysis.
  • Create and develop an End-to-End Data Ingestion on to Hadoop.
  • Involved in architecture and design of distributed time-series database platform using NOSQL technologies like Hadoop/HBase, Zookeeper.
  • Integrated NoSQL database like HBase with Map Reduce to move bulk amount of data into HBase.
  • Efficiently put and fetched data to/from HBase by writing MapReduce job.

Environment: Hadoop, Kafka, Spark, Sqoop, Hive, pig, NoSQL, Impala, Oozie, HBase, Zookeeper.

Confidential

Java Developer

Responsibilities:

  • Involved in Architecture and System Design and development process.
  • Worked with off-site (USA based) resources for successful implementation of the Workflow module.
  • Created UI screens using StrutsMVC for logging into the system and performing various operations on network elements.
  • Classified users into various organizations to differentiate the privileges between them in accessing the system.
  • Developed Use Cases, Business Logic and Unit Testing of Struts Based Application.
  • Developed JSP pages using Custom tags and Tiles framework and Struts framework.
  • Developed UI Screens for presentation logic using JSP, Struts Tiles, and HTML.
  • Used display tag to render large volumes of data.
  • Used Bean, HTML and Logic tags to avoid java expressions and scriplets in JSP.
  • Implemented Design patterns like Session Façade, Command, Singleton and DAO in business layer.
  • Created EJBs for Backend operations. Also used Hibernate for Database persistence.
  • Sent message objects using JMS to client queues and topics.
  • Created Unit test cases for unit testing.
  • Used Log4j for logging purposes and defined debug levels to control the log.
  • Built Application EAR using ANT.
  • Included Hibernate 3.0 annotations for Oracle DB.

Environment: Java 1.5, JavaScript, CSS, AJAX, J2EE, JSP, EJB, Struts 1.2, WebSphere 5.0, Apache TOMCAT, Web Services, Hibernate, JMS, XML, XSL, HTML.

Hire Now