Hadoop Developer Resume
White Plains, NY
SUMMARY:
- Around 7 years of experience working as a developer which includes extensive experience in Big Data Hadoop technologies and Java/J2EE technologies.
- Comprehensive working knowledge of Hadoop framework, Hadoop ecosystem, MapReduce, NoSQL databases in Financial and Health Care domains.
- Worked with various development methodologies like SDLC (Waterfall Model), Agile (Scrum process) and Iterative Software development.
- Hands on Experience in Big Data Tools & Technologies including Hadoop, HDFS, MapReduce, YARN, Hive, Pig, Hbase, Sqoop, Flume, Kafka, Spark, Impala, Oozie, UC4 Zookeeper.
- Experience in writing HiveQL & Pig Latin to load/analyze data in Hadoop HDFS.
- Experience in using Sqoop to migrate data between HDFS and RDBMS and using Flume to import log data.
- Experience in NoSQL Column - Oriented Databases like HBase, Cassandra and its Integration with Hadoop cluster.
- Data analysis with partitioning and bucketing concepts using Hive.
- Hands on experience in messaging system such as Kafka 0.8+.
- Hands on experience with Spark QL and Spark Streaming with Scala and Python.
- Worked with efficient storage formats like PARQUET, AVRO and ORC integrated them with Hadoop and the ecosystem (Hive, Impala, and spark). Also used compressions techniques like Snappy and GZip.
- Understanding of Amazon Web Services stack and hands-on experience in using S3, EMR, Redshift, DynamoDB and hosting clusters on EC2.
- Proficient in writing SQL queries to work with relational databases such as Oracle, MySQL, MS SQL Server.
- Previous working experience in J2EE based technologies such as Core Java, JSP, JDBC.
- Working knowledge with Java MVC Frameworks including Struts, Spring, Hibernate.
- Working experience in web technologies including HTML5, CSS3, JavaScript, Web Services including REST, SOAP and Spring Framework.
- Hands on experience of testing techniques such as JUnit and version control software such as Git.
- Oracle Certified Associate, Java SE 8 Programmer
- Excellent interpersonal and communication skills, creative, research-minded, technically competent and result-oriented.
Hadoop Ecosystem Distributions: Hadoop, Spark 1.3+, MapReduce, Hive Cloudera, Hortonworks, MapR, Amazon Web 0.12+, Pig 0.11+, Flume 1.3+, HBase 0.98+, Services - EC2, S3 EMR, DynamoDB Sqoop 1.4.6, Oozie 3.3+, HDFS, Kafka 0.8.1+, Zookeeper 3.4+, Automic
Databases Methodologies: Oracle 9i/11g, MySQL 5.0+, MS SQL Server Agile Scrum, Waterfall
NOSQL: Cassandra, MongoDB
Languages Web Technologies: Java 6/7/8, Scala, Python, SQL, HiveQL, Servlets 3.0, JSP, JDBC, HTML 5, REST, Pig Latin, JavaScript, Shell-Scripting SOAP, JSON, XML, CSS
Other Systems: Eclipse, Maven, MVC, JUnit, Testing Whiz, Linux, UNIX, Windows Tableau, Git
PROFESSIONAL EXPERIENCE:
Confidential, White Plains, NY
Hadoop Developer
Responsibilities:
- Extensively worked on writing the shell scripts to implement the dataflow logic for the ingestion through automated process. Scripts incorporating functionalities like logging, email alerts, retry logics and parameterized inputs.
- Built Internal and External tables using Hive. Good exposure on Hive ddl’s to create, alter and drop tables/views/partitions.
- Performed joins, dynamic partitions, and bucketing on hive tables utilizing hive SerDes like CSV , REGEX , JSON and AVRO .
- Worked with different kind of compression techniques to save space and optimize data transfer over network using Snappy , Gzip , Lzo etc.
- Developed script for parallel running multiple spark jobs , by getting a spark session, submitting the job with right configs and end the session upon completion.
- Widely used Unix commands with PuTTY/Cygwin to access remote server.
- Wrote SQL queries via Impala for accessing and analysing the processed data.
- Involved in writing jobplans in Automic (UC4) to schedule and automate end-end process.
- Created a process to replicate the data to Dev/QA clusters daily.
- Designed and scheduled a workflow for a downstream system that uses the ingested data to calculate KPI metrics.
- Actively supported the production process by monitoring the jobs and diagnosing/fixing the issues to meet the SLA on time.
- Gained experience in managing and reviewing Hadoop log files.
- Maintained environmental profiles specific to roles/users and scheduled cron jobs for adhoc needs.
- Created and maintained Technical documentation and Runbooks for accessing Hadoop Clusters in different environments and logistics of jobs on client’s confluence page.
- Worked closely with business units to define development estimates according to Agile Methodology .
Environment: Hadoop, HDFS, MapReduce, Apache Hive, Apache Pig, KornShell, Spark-SQL, Automic UC4, Impala, Kerberos, Hortonworks, Python, Unix, PuTTY, MySQL, S3, Agile/Scrum, GitBash
Confidential, Roseland, NJ
Big Data Developer
Responsibilities
- Worked extensively with Sqoop to ingest secondary data (CRM, ODS, marketing spends) from Relational Database to HDFS.
- Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
- Used Flume to ingest raw data in text format to HDFS. Also used Flume interceptors to filter the data before ingesting.
- Developed MapReduce logic to perform sanitization to remove invalid/incomplete log files.
- Developed Hive scripts for implementing deduplication.
- Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Worked with Spark RDD and Dataframes for sessionization and other transformations.
- Wrote SQL queries via Impala for accessing and analysing the processed data.
- Involved in writing workflows in Oozie to orchestrate multiple steps.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.
- Collaborating with the teams using several integration and defect tracking tools like Jenkins and JIRA.
Environment: Cloudera, Hadoop, Sqoop, Flume Avro, Hive, SNAPPY compression, Hive, Spark, Impala, HBASE, Oozie workflow
Confidential, Lincoln Harbor, NJ
Hadoop Developer
Responsibilities
- Developed Map/Reduce jobs using Java for data transformations.
- Extensively worked on performance tuning of Hive scripts.
- Developed Hive Internal and External tables, with operations to create, alter and drop tables/views.
- Proficient with the concepts of partitions - static and dynamic, bucketing on hive tables.
- Written Sqoop scripts to inbound and outbound data to HDFS and validated the data before loading to check the duplicated data.
- Developed Spark code using Scala and Spark - SQL for faster testing and processing of data.
- Experience in using Zookeeper and Oozie for coordinating the cluster and scheduling workflows.
- Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
- Worked with using different kind of compression techniques to save data and optimize data transfer over network using Lzo , Snappy , etc.
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig , Hive , and Hbase .
- Worked on Git hub repository, branching, merging, etc.
Environment: Hadoop, HDFS, Map Reduce, MapR, HIVE, Pig, Sqoop, HBase, Oozie, Zookeeper, Shell scripting, HiveQL, NOSQL database (HBASE), RDBMS, Eclipse, Oracle 11g, Tableau
Confidential, Melville, NY
Hadoop Developer
Responsibilities
- Worked with the Data Science team to gather requirements for various data mining projects.
- Load and transform large sets of structured, and semi structured data.
- Wrote Map Reduce job using Java API.
- Imported/exported data from RDMS to HDFS using Sqoop .
- Created Hive tables and wrote Hive queries for data analysis to meet the business requirements.
- Used Impala to read, write and query the data in HDFS .
- Experienced in migrating HiveQL into Impala to minimize query response time
- Configured Hive metastore, which stores the metadata for Hive tables and partitions in a relational database.
- Worked on Flume for efficiently collecting, aggregating and moving large amounts of log data.
- Worked on configuring security for Hadoop cluster ( Kerberos , Active Directory)
- Installed and configured Zookeeper for Hadoop cluster. Worked on setting up high availability for cluster and designed automatic failover using zookeeper .
- Tuning MR Programs that are running on the Hadoop cluster.
- Worked with application teams to install Hadoop updates, patches, version upgrades and operating system as required.
Confidential
Java/J2EE Developer
Responsibilities
- Performed in different phases of the Software Development Lifecycle (SDLC) of the application, including: requirements gathering, analysis, design, development and deployment of the application.
- Developed Action Forms and Controllers in Struts 2.0 framework.
- Designed, developed and maintained the data layer using Hibernate.
- Implemented and developed the application using Struts2, Servlets, JSP, JSTL, Collection API.
- Used web services SOAP as a communication between Applications.
- Configured the JDBC connection with Database layer.
- Involved in User Design using HTML, CSS, JavaScript, AJAX, JQuery
- Developed JavaScript validations on order submission forms.
- JUnit was used to do the Unit testing for the application.
- Used Apache Ant to compile java classes and package into jar archive.
- Involved in tracking and resolving defects, which arise in QA & production environments.
Environment: Java, J2EE, JSP, Servlets, Struts 2.0/1.2, Hibernate, HTML, CSS, JavaScript, JUnit, Apache Tomcat, PL/SQL, Eclipse
Confidential
Java Developer
Responsibilities
- Analyze the requirements and communicate the same to both Development and Testing teams.
- Developed and implemented business logic using Java, JSP, Servlets, Java Mail API, XML.
- Wrote SQL queries for complex operations.
- Implemented client side validation using AJAX and Javascipt.
- Designed interactive web pages using HTML, CSS, JavaScipt, JQuery.
- Used Oracle as backend databases.
- Used Log4j for External Configuration Files and debugging.
- Code Reviews and Unit Testing with the help of JUnit.
- Preparing user document for developers of Middleware and client teams.
- Used Eclipse / Weblogic Workshop as the IDE.
Environment: J2EE, Java, JSP, JDBC, JavaScript, HTML, XML, JMS, Eclipse IDE, PL/SQL, Oracle, JUnit, Windows