Sr. Hadoop Developer Resume Chicago, IL - Hire IT People

SUMMARY:

9 years of IT professional experience with emphasis on Design, Development, Implementation, Testing, Maintenance and Deployment of Software Applications using Java, J2EE and Big Data technologies.
4+ years of experience in Big Data technologies and Hadoop ecosystem components like Spark, HDFS, MapReduce, Pig, Hive, YARN, Sqoop, Flume, Kafka and NoSQL systems like HBase, Cassandra.
Expertise in architecting real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Hive, Impala etc.
Experience in creating complex data ingestion pipelines, data transformations, data management and data governance at Enterprise level.
Hands - on experience in installing, configuring and monitoring HDFS clusters (on premise & cloud AWS).
In depth understanding of Map Reduce and AWS cloud concepts and its critical role in data analysis of huge and complex datasets.
Created custom Database Encryption Connectors that could be plugged into Sqoop to be able to encrypt the data while importing to HDFS/Hive.
Experience in fine-tuning and troubleshooting Spark Applications, Hive queries.
Extensive hands on experience in writing complex MapReduce jobs, Pig Scripts and HiveQL scripts.
Experience using various Hadoop Distributions (Cloudera, Hortonworks, Amazon AWS) to fully implement and leverage new Hadoop features.
Experience in using Amazon Cloud services like S3, EMR etc.
Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
Experience in Apache Flume and Kafka for collection, aggregation and moving huge chunks of data from various sources such as web server, telnet sources etc.
Worked on Java HBase API for ingesting processed data to HBase tables.
Extensive knowledge and experience in using Apache Storm, Spark Streaming, Apache Spark, Kafka and Flume in creating data streaming solutions.
Experienced in working with Machine learning libraries (spark-MLlib) and implementing ML algorithms for clustering, regression filtering and dimensional reduction.
Extensive understanding of Partitions and Bucketing concepts in Hive.
Designed and managed External tables with right partition strategies to optimize performance in Hive.
Extensive experiences in working with semi/unstructured data by implementing complex MapReduce programs using Combiners, Partitioners and design patterns.
Created few Hive UDF's to perform some complex business specific transformations and rules.
Expert knowledge over J2EE Design Patterns like MVC Architecture, Session Facade, Front Controller and Data Access Objects for building J2EE Applications.
Experienced in using Agile methodologies including extreme programming, Scrum Process and Test-Driven Development (TDD).
Intensive work experience in developing enterprise solutions using Java, J2EE, Struts, Servlets, Hibernate, JavaBeans, JDBC, JSP, JSF, JSTL, MVC, Spring, Custom Tag Libraries, JNDI, AJAX, SQL, JavaScript, AngularJS and XML.
Conversant with web application Servers like Tomcat, WebSphere, WebLogic and Jboss servers.
Experience in development of logging standards and mechanism based on Log4j.
Experience in writing ANT and Maven scripts to build and deploy Java applications.

TECHNICAL SKILLS:

Big Data Ecosystem: Hadoop, HDFS, Spark, Hive, Kafka, Hue, MapReduce, YARN, Pig, Sqoop, HBase, Cassandra, Zookeeper, Oozie, Storm, Flume, Talend, Cloudera Manager, Amazon Web Services AWS, Hortonworks and Cloudera clusters

Programming Languages: C, Java, Unix Shell Scripting, AngularJS, PL/SQL, Pig Latin, Python, HiveQL, Scala, SQL

Scripting Languages: Shell Scripting, Java script

Java Tools & Web Technologies: J2EE, JSF, EJB, HTML, XHTML, AngularJS, Servlets, JSP, JSTL, CSS, CSS, XML, XSL, XSLT, Ajax

Open Source: Hibernate, Spring AOP, Spring IOC, Spring MVC, Spring Web Flow

Web Services: SOAP, RESTful, JAX-WS

Web Application Servers: WebSphere Application Server, Web Logic, Web Sphere, Apache Tomcat.

Design: UML, Rational Rose, E-R Modelling, Microsoft Visio

Database: MongoDB, Cassandra, HBase, Oracle 11g/10g/9i, MySQL, Microsoft SQL Server, Squirrel SQL, Teradata SQL, DB2, RDBMS

Frameworks: MVC, Spring, Struts, JPA, ORM (Hibernate), JDBC

Visualization: Tableau and MS Excel

Development Tools: Eclipse, NetBeans, ANT, Maven and SBT

Version Control System: GITHUB, CVS, SVN

PROFESSIONAL EXPERIENCE:

Confidential, Chicago, IL

Sr. Hadoop Developer

Responsibilities:

Worked on importing and exporting data from DB2 and HP Vertica into HIVE using Sqoop for visualization, analysis and to generate reports.
Loaded CSV files containing user event information into Hive External tables on daily basis.
Created Spark applications and used Data frames and SparkSQL API primarily for performing event enrichment and performing lookups with other enterprise data sources.
Build a real time streaming pipeline by using Kafka integration with Storm and Spark Streaming.
Perform ETL on different formats of data like JSON, CSV files and converted them to parquet while loading to final tables. Ran ad-hoc querying using Hive and Impala.
Extensively perform complex data transformations in Spark using Scala language.
Worked extensively with importing metadata into Hive using Scala and migrated existing tables and applications to work on Hive and AWS cloud.
Involved in converting Hive/SQL queries into Spark transformations using Scala.
Connect Tableau and Squirrel SQL clients to SparkSQL (Spark thrift server) via data source and run the queries.
Worked with Machine learning libraries (Spark-MLlib) and done clustering, regression, filtering and dimensional reduction using implemented ML algorithms.
Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala (Prototype).
Used Impala as the primary analytical tool for allowing visualization servers to connect and perform reporting on top Hadoop directly.
Used Machine learning libraries as Spark MLlib and implemented ML algorithms for clustering, regression filtering and dimensional reduction process with data scientists.
Wrote spark jobs for Data clustering and data processing using Spark-MLlib and cluster algorithms as per functional requirements.
Experience in large-scale streaming data analytics using Storm.
Working on Oozie workflow engine to run multiple HiveQL jobs and on schedulers.
Done various compressions and file formats like Parquet, Snappy, Gzip, Bzip2, Avro, Text.
Implemented test scripts to support test driven development and continuous integration.
Used Zookeeper to provide coordination services to the cluster.
Used Impala and tableau to create various reporting dashboards.

Environment: HDFS, Hadoop 2.x, Pig, Hive, Sqoop, Flume, Spark, MapReduce, Scala, Oozie, YARN, Tableau, Squirrel SQL, Spark-SQL, Spark-MLlib, Impala, HBase, Cassandra, UNIX Shell Scripting, Storm, Zookeeper, Kafka, Agile Methodology, Cloudera 5.9, SBT.

Confidential, Denver, CO

Sr. Hadoop Developer

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
Worked on importing and exporting data from Teradata, mysql into Hive using Sqoop.
Involved in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
Developed Hive scripts to perform data transformation and etl processes.
Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from assorted data sources to make it suitable for ingestion into Hive schema for analysis.
Designed and developed MapReduce jobs to process data coming in different file formats like XML, CSV, JSON.
Transferring data between MySQL and Hadoop Distributed File System (HDFS) using Sqoop with connectors.
Creating and populating Hive tables and writing Hive queries for data analysis to meet the business requirements.
Running MapReduce jobs to access HBase data from application using Java Client APIs.
Developed Pig Latin scripts and HQL queries for the analysis of Structured, Semi-Structured and Unstructured data.
Automating the extraction, processing and analysis of data jobs using Oozie.
Used SVN for version control.
Created working POC’s using Spark 1.1.0 streaming for real time stream processing of continuous stream of large data sets.
Involved building and managing NoSQL Database like Hbase or Cassandra.
Integrating Cassandra with Storm for real time user attributes look up.
Worked in Spark to read the data from Hive and write it to Cassandra using Java.
Developed custom aggregate functions using Spark SQL and performed interactive querying.
Developed Spark scripts by using Scala and Python Shell commands as per the requirement.
Installed and configured Hadoop Scala, HDFS (AWS cloud) and responsible for Developing multiple Spark jobs in Scala, R scripts & CLI for data managing and preprocessing events.
Implemented Machine learning libraries, spark-MLlib algorithms for clustering, filtering, regression and dimensional reduction.
Worked on data warehouse product Amazon Redshift which is a part of the AWS (Amazon Web Services)
Developed Spark and SparkSQL scripts to migrate data from RDBMS into AWS-RedShift.
Developed ETL Scripts for Data acquisition and Transformation using Talend.

Environment: Hadoop 2.x, Hive, HQL, HDFS, MapReduce, Spark 1.1.0, Scala, Sqoop, Storm, Kafka, Flume, Oozie, HBase, AWS-RedShift, Python, Java, Maven, Eclipse, Putty, AWS-EC2, Talend, Hortonworks.

Confidential, Minnetonka, MN

Hadoop Developer

Responsibilities:

Analyzing the functional specifications provided by the client and developing detailed solution design document with the Architect and the team.
Used Hadoop architecture with Map Reduce functionality and its ecosystem to solve the customer requirements using Cloudera Distribution for Hadoop (CDH).
Developed multiple Map Reduce jobs in Java for complex business requirements including data cleansing and preprocessing.
Created Data Lake which serves as a base layer to store and do analytics on data flowing from multiple sources into Hadoop Platform.
Involved in creating Hive tables, loading the data and writing hive queries which will run internally in map reduce.
Optimized Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries.
Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
Scripted complex HiveQL queries on Hive tables for analytical functions by implementing Hive Generic UDFs.
Wrote customized User Defined Function’s (UDF) to process further in Java and Perl to ease the processing in Pig.
Involved in building applications using Maven and integrated them by using continuous Integration servers like Jenkins to build jobs.
Worked on Import and export data from Legacy Databases RDBMS into HDFS and Hive using Sqoop.
Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
Designing NoSQL schemas in HBase and Cassandra.
Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

Environment: Hadoop 1.x, HDFS, MapReduce, Hive, Pig, HBase, Tez, Sqoop, Oozie, Maven, Shell Scripting, Teradata, CDH3, Cloudera Manager.

Confidential, Raleigh, NC

Java Application Developer

Responsibilities:

Involved in distinct phases of Software Development Life Cycle as requirement gathering, data modeling, analysis, architecture design & development for the project.
Worked in Agile Environment and responsibilities include analysis of these various applications, co-ordination with client, meetings with business users.
Used JSF Framework and Spring Framework in the application, which is based on MVC design pattern.
Spring Framework is used for Dependency injection, security and integration with the Hibernate framework.
Written the Java Script, HTML, DHTML, CSS, Servlets, and JSP for designing GUI of the application.
Designed and developed user interface using JSP, JSTL, JSF and Custom Tag Libraries and AJAX JQuery, Angular JS to speed the application.
Extensively used Hibernate in the database to access and update information in data access layer.
Used Angular JS, JQuery to manipulate the DOM objects for Ajax calls and for User interface look and feel.
Extensive knowledge of database practices, structures, principles and theories.
Used PL/SQL to manage data and create tables and performed unit testing using Junit framework
Implemented RESTFUL web services using Angular JS.
Used MAVEN to build the application.
Monitored the error logs using Log4J and fixed the problems.
Developed the different components of application using STS 3.4 and used SVN for version control.

Environment: Java/J2EE, Tomcat 7.0, Spring 3.1, Hibernate 3.2, JavaScript, HTML, DHTML, CSS, Servlets, JSP, JSTL, XML, SOAP, Web Services, PL/SQL, JDBC, JQuery, Ajax, Maven, MVC, Log4J, Unix, Angular JS, JSF 1.2, Log4J, STS 3.4, SVN.

We provide IT Staff Augmentation Services!

Sr. Hadoop Developer Resume

Chicago, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship