Sr. Hadoop Developer Resume
Chicago, IL
SUMMARY:
- 9 years of IT professional experience with emphasis on Design, Development, Implementation, Testing, Maintenance and Deployment of Software Applications using Java, J2EE and Big Data technologies.
- 4+ years of experience in Big Data technologies and Hadoop ecosystem components like Spark, HDFS, MapReduce, Pig, Hive, YARN, Sqoop, Flume, Kafka and NoSQL systems like HBase, Cassandra.
- Expertise in architecting real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Hive, Impala etc.
- Experience in creating complex data ingestion pipelines, data transformations, data management and data governance at Enterprise level.
- Hands - on experience in installing, configuring and monitoring HDFS clusters (on premise & cloud AWS).
- In depth understanding of Map Reduce and AWS cloud concepts and its critical role in data analysis of huge and complex datasets.
- Created custom Database Encryption Connectors that could be plugged into Sqoop to be able to encrypt the data while importing to HDFS/Hive.
- Experience in fine-tuning and troubleshooting Spark Applications, Hive queries.
- Extensive hands on experience in writing complex MapReduce jobs, Pig Scripts and HiveQL scripts.
- Experience using various Hadoop Distributions (Cloudera, Hortonworks, Amazon AWS) to fully implement and leverage new Hadoop features.
- Experience in using Amazon Cloud services like S3, EMR etc.
- Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations in Cloudera Cluster.
- Experience in Apache Flume and Kafka for collection, aggregation and moving huge chunks of data from various sources such as web server, telnet sources etc.
- Worked on Java HBase API for ingesting processed data to HBase tables.
- Extensive knowledge and experience in using Apache Storm, Spark Streaming, Apache Spark, Kafka and Flume in creating data streaming solutions.
- Experienced in working with Machine learning libraries (spark-MLlib) and implementing ML algorithms for clustering, regression filtering and dimensional reduction.
- Extensive understanding of Partitions and Bucketing concepts in Hive.
- Designed and managed External tables with right partition strategies to optimize performance in Hive.
- Extensive experiences in working with semi/unstructured data by implementing complex MapReduce programs using Combiners, Partitioners and design patterns.
- Created few Hive UDF's to perform some complex business specific transformations and rules.
- Expert knowledge over J2EE Design Patterns like MVC Architecture, Session Facade, Front Controller and Data Access Objects for building J2EE Applications.
- Experienced in using Agile methodologies including extreme programming, Scrum Process and Test-Driven Development (TDD).
- Intensive work experience in developing enterprise solutions using Java, J2EE, Struts, Servlets, Hibernate, JavaBeans, JDBC, JSP, JSF, JSTL, MVC, Spring, Custom Tag Libraries, JNDI, AJAX, SQL, JavaScript, AngularJS and XML.
- Conversant with web application Servers like Tomcat, WebSphere, WebLogic and Jboss servers.
- Experience in development of logging standards and mechanism based on Log4j.
- Experience in writing ANT and Maven scripts to build and deploy Java applications.
TECHNICAL SKILLS:
Big Data Ecosystem: Hadoop, HDFS, Spark, Hive, Kafka, Hue, MapReduce, YARN, Pig, Sqoop, HBase, Cassandra, Zookeeper, Oozie, Storm, Flume, Talend, Cloudera Manager, Amazon Web Services AWS, Hortonworks and Cloudera clusters
Programming Languages: C, Java, Unix Shell Scripting, AngularJS, PL/SQL, Pig Latin, Python, HiveQL, Scala, SQL
Scripting Languages: Shell Scripting, Java script
Java Tools & Web Technologies: J2EE, JSF, EJB, HTML, XHTML, AngularJS, Servlets, JSP, JSTL, CSS, CSS, XML, XSL, XSLT, Ajax
Open Source: Hibernate, Spring AOP, Spring IOC, Spring MVC, Spring Web Flow
Web Services: SOAP, RESTful, JAX-WS
Web Application Servers: WebSphere Application Server, Web Logic, Web Sphere, Apache Tomcat.
Design: UML, Rational Rose, E-R Modelling, Microsoft Visio
Database: MongoDB, Cassandra, HBase, Oracle 11g/10g/9i, MySQL, Microsoft SQL Server, Squirrel SQL, Teradata SQL, DB2, RDBMS
Frameworks: MVC, Spring, Struts, JPA, ORM (Hibernate), JDBC
Visualization: Tableau and MS Excel
Development Tools: Eclipse, NetBeans, ANT, Maven and SBT
Version Control System: GITHUB, CVS, SVN
PROFESSIONAL EXPERIENCE:
Confidential, Chicago, IL
Sr. Hadoop Developer
Responsibilities:
- Worked on importing and exporting data from DB2 and HP Vertica into HIVE using Sqoop for visualization, analysis and to generate reports.
- Loaded CSV files containing user event information into Hive External tables on daily basis.
- Created Spark applications and used Data frames and SparkSQL API primarily for performing event enrichment and performing lookups with other enterprise data sources.
- Build a real time streaming pipeline by using Kafka integration with Storm and Spark Streaming.
- Perform ETL on different formats of data like JSON, CSV files and converted them to parquet while loading to final tables. Ran ad-hoc querying using Hive and Impala.
- Extensively perform complex data transformations in Spark using Scala language.
- Worked extensively with importing metadata into Hive using Scala and migrated existing tables and applications to work on Hive and AWS cloud.
- Involved in converting Hive/SQL queries into Spark transformations using Scala.
- Connect Tableau and Squirrel SQL clients to SparkSQL (Spark thrift server) via data source and run the queries.
- Worked with Machine learning libraries (Spark-MLlib) and done clustering, regression, filtering and dimensional reduction using implemented ML algorithms.
- Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala (Prototype).
- Used Impala as the primary analytical tool for allowing visualization servers to connect and perform reporting on top Hadoop directly.
- Used Machine learning libraries as Spark MLlib and implemented ML algorithms for clustering, regression filtering and dimensional reduction process with data scientists.
- Wrote spark jobs for Data clustering and data processing using Spark-MLlib and cluster algorithms as per functional requirements.
- Experience in large-scale streaming data analytics using Storm.
- Working on Oozie workflow engine to run multiple HiveQL jobs and on schedulers.
- Done various compressions and file formats like Parquet, Snappy, Gzip, Bzip2, Avro, Text.
- Implemented test scripts to support test driven development and continuous integration.
- Used Zookeeper to provide coordination services to the cluster.
- Used Impala and tableau to create various reporting dashboards.
Environment: HDFS, Hadoop 2.x, Pig, Hive, Sqoop, Flume, Spark, MapReduce, Scala, Oozie, YARN, Tableau, Squirrel SQL, Spark-SQL, Spark-MLlib, Impala, HBase, Cassandra, UNIX Shell Scripting, Storm, Zookeeper, Kafka, Agile Methodology, Cloudera 5.9, SBT.
Confidential, Denver, CO
Sr. Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services(AWS) on EC2.
- Worked on importing and exporting data from Teradata, mysql into Hive using Sqoop.
- Involved in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
- Developed Hive scripts to perform data transformation and etl processes.
- Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from assorted data sources to make it suitable for ingestion into Hive schema for analysis.
- Designed and developed MapReduce jobs to process data coming in different file formats like XML, CSV, JSON.
- Transferring data between MySQL and Hadoop Distributed File System (HDFS) using Sqoop with connectors.
- Creating and populating Hive tables and writing Hive queries for data analysis to meet the business requirements.
- Running MapReduce jobs to access HBase data from application using Java Client APIs.
- Developed Pig Latin scripts and HQL queries for the analysis of Structured, Semi-Structured and Unstructured data.
- Automating the extraction, processing and analysis of data jobs using Oozie.
- Used SVN for version control.
- Created working POC’s using Spark 1.1.0 streaming for real time stream processing of continuous stream of large data sets.
- Involved building and managing NoSQL Database like Hbase or Cassandra.
- Integrating Cassandra with Storm for real time user attributes look up.
- Worked in Spark to read the data from Hive and write it to Cassandra using Java.
- Developed custom aggregate functions using Spark SQL and performed interactive querying.
- Developed Spark scripts by using Scala and Python Shell commands as per the requirement.
- Installed and configured Hadoop Scala, HDFS (AWS cloud) and responsible for Developing multiple Spark jobs in Scala, R scripts & CLI for data managing and preprocessing events.
- Implemented Machine learning libraries, spark-MLlib algorithms for clustering, filtering, regression and dimensional reduction.
- Worked on data warehouse product Amazon Redshift which is a part of the AWS (Amazon Web Services)
- Developed Spark and SparkSQL scripts to migrate data from RDBMS into AWS-RedShift.
- Developed ETL Scripts for Data acquisition and Transformation using Talend.
Environment: Hadoop 2.x, Hive, HQL, HDFS, MapReduce, Spark 1.1.0, Scala, Sqoop, Storm, Kafka, Flume, Oozie, HBase, AWS-RedShift, Python, Java, Maven, Eclipse, Putty, AWS-EC2, Talend, Hortonworks.
Confidential, Minnetonka, MN
Hadoop Developer
Responsibilities:
- Analyzing the functional specifications provided by the client and developing detailed solution design document with the Architect and the team.
- Used Hadoop architecture with Map Reduce functionality and its ecosystem to solve the customer requirements using Cloudera Distribution for Hadoop (CDH).
- Developed multiple Map Reduce jobs in Java for complex business requirements including data cleansing and preprocessing.
- Created Data Lake which serves as a base layer to store and do analytics on data flowing from multiple sources into Hadoop Platform.
- Involved in creating Hive tables, loading the data and writing hive queries which will run internally in map reduce.
- Optimized Hive tables using optimization techniques like partitions and bucketing to provide better performance with HiveQL queries.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Scripted complex HiveQL queries on Hive tables for analytical functions by implementing Hive Generic UDFs.
- Wrote customized User Defined Function’s (UDF) to process further in Java and Perl to ease the processing in Pig.
- Involved in building applications using Maven and integrated them by using continuous Integration servers like Jenkins to build jobs.
- Worked on Import and export data from Legacy Databases RDBMS into HDFS and Hive using Sqoop.
- Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Designing NoSQL schemas in HBase and Cassandra.
- Involved in Agile methodologies, daily Scrum meetings, Sprint planning.
Environment: Hadoop 1.x, HDFS, MapReduce, Hive, Pig, HBase, Tez, Sqoop, Oozie, Maven, Shell Scripting, Teradata, CDH3, Cloudera Manager.
Confidential, Raleigh, NC
Java Application Developer
Responsibilities:
- Involved in distinct phases of Software Development Life Cycle as requirement gathering, data modeling, analysis, architecture design & development for the project.
- Worked in Agile Environment and responsibilities include analysis of these various applications, co-ordination with client, meetings with business users.
- Used JSF Framework and Spring Framework in the application, which is based on MVC design pattern.
- Spring Framework is used for Dependency injection, security and integration with the Hibernate framework.
- Written the Java Script, HTML, DHTML, CSS, Servlets, and JSP for designing GUI of the application.
- Designed and developed user interface using JSP, JSTL, JSF and Custom Tag Libraries and AJAX JQuery, Angular JS to speed the application.
- Extensively used Hibernate in the database to access and update information in data access layer.
- Used Angular JS, JQuery to manipulate the DOM objects for Ajax calls and for User interface look and feel.
- Extensive knowledge of database practices, structures, principles and theories.
- Used PL/SQL to manage data and create tables and performed unit testing using Junit framework
- Implemented RESTFUL web services using Angular JS.
- Used MAVEN to build the application.
- Monitored the error logs using Log4J and fixed the problems.
- Developed the different components of application using STS 3.4 and used SVN for version control.
Environment: Java/J2EE, Tomcat 7.0, Spring 3.1, Hibernate 3.2, JavaScript, HTML, DHTML, CSS, Servlets, JSP, JSTL, XML, SOAP, Web Services, PL/SQL, JDBC, JQuery, Ajax, Maven, MVC, Log4J, Unix, Angular JS, JSF 1.2, Log4J, STS 3.4, SVN.