Hadoop Developer Resume
Richmond, VA
SUMMARY
- 7 years of Professional experience in IT Industry in Developing, Implementing, configuring, testing Hadoop ecosystem components and maintenance of various web based applications using Java, J2EE.
- 3+ years Real time experience in Hadoop Framework and its ecosystem.
- Experience in installation, configuration and managing - Cloudera (CDH3&4) and Hortonworks Hadoop platform along with CDH3&4 clusters.
- Worked on Multi Clustered environment and setting up Cloudera Hadoop echo system.
- Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Map, Reduce, Job Tracker, Task Tracker, Namenode, Datanode, Kafka and Secondary Namenode concepts.
- Experience in dealing with ApacheHadoopcomponents like HDFS, Map Reduce, Sqoop, Hive, PIG, Oozie, Apache Flume, Zookeeper, Ambari.
- Good knowledge on Spark In-memory capabilities and its modules: Spark Streaming, Spark-SQL, Spark MLlib.
- Hands on experience with working on Spark using both Scala and python.
- Performed various actions and transformations on spark RDD’s and DataFrames.
- Good understanding of Nosql databases including Hbase and Mongodb.
- Expertise in writing Hadoop Jobs for processing and analyzing data using MapReduce, Hive & Pig.
- Experienced in extending Hive and Pig core functionality by writing custom UDFs using Java.
- Hands-on experience on YARN (MapReduce 2.0) architecture and it components.
- Hands on experience using Core Java, UNIX Shell scripting and RDBMS.
- Excellent Java development skills using J2EE, J2SE, Servlets, JUnit, JSP, JDBC.
- Very good hands-on technical knowledge of ETL Tools, DataStage, SQL and PL/SQL.
- Vast Experience in Teradata and Involved in Converting Projects from Teradata to Hadoop.
- Experience implementing SOAP and REST Web Services.
- Hands on experience working on virtualization tools like Tableau, Arcadia Data.
- Well versed with Agile working environment using JIRA and code version tools like GIT and SVN.
TECHNICAL SKILLS
Hadoop/Big Data: HDFS, MapReduce, Hive, Pig, Oozie, Sqoop, Zookeeper, YARN, TEZ, Flume, Spark, Kafka
Java&J2EE Technologies: Core Java, Servlets, JSP, JDBC, JNDI and Java Beans
Databases: Teradata, Oracle11g/10g, MySQL, DB2, SQL Server, NoSQL (Hbase, MongoDB)
Web Technologies: JavaScript, AJAX, HTML, XML and CSS.
Programming Languages: Java, JQuery, Scala, Python, UNIX Shell Scripting
IDE: Eclipse, NetBeans, pyCharms
Integration & Security: MuleSoft, Oracle IDM & OAM, SAML, EDI, EAI
Build Management tools: Maven, Apache ANT, SOAP, REST
Predictive Modelling Tools: SAS Editor, SAS Enterprise guide, SAS Miner, IBM Cognos.
Scheduling Tools: Cron tab, Autosys, Ctrl M
Visualization Tools: Tableau, Arcadia Data.
PROFESSIONAL EXPERIENCE
Confidential, Richmond, VA
Hadoop Developer
Responsibilities:
- Imported the retail and commercial data from various vendors into HDFS using EDE process and Sqoop.
- Designed the Cascading flow setup from the Edge node to the HDFS (Data lake)
- Created the cascading code to do several type of data transformations as required by the DA
- Involved in converting Hive/SQL queries intosparktransformations usingsparkRDDs and python(pyspark).
- Involved in running analytics workloads and long running services on Apache Mesos cluster manager.
- Developed ApacheSparkApplications by using Scala, Java and Implemented ApacheSparkdata processing project to handle data from various RDBMS and Streaming sources.
- Used the Hue to create external Hive tables on the data in the data imported and on transformed data
- Developed the code for removing or replacing the error fields in the data fields using cascading
- Created the custom functions for several datatype conversions, handling the errors in the data provided by the vendor
- Monitored the cascading flow using the Driven component to ensure the desired result was obtained
- Optimized a Confidential tool Docs, for importing the data and converting the data into parquet file format post validation.
- Involved in testing the tool Spark for exporting the data from HDFS to external database in POC
- Developed the shell scripts for automating the cascading jobs for Control M schedule.
- Involved in testing the AWS Redshift to connecting with SQL database for testing and storing data in POC
- Developed Hive queries to analyze the data according to the customer rating Id for several projects
- Experience in developing variousSparkStreaming API's using python. (pyspark).
- Developingsparkcode using pyspark to applying various transformations and actions for faster data processing.
- Working knowledge on ApacheSparkStreaming API that enables scalable, high - throughput, fault-tolerant stream processing of live data streams.
- UsedSparkStream processing to get data into in-memory, implemented RDD transformations, and performed actions.
- Converted the raw files (CSV, TSV) to different file formats like Parquet and Avro with datatype conversion using cascading
- Involved in writing the test cases for the cascading jobs using Plunger framework.
- Involved in loading the structured and semi structured data intosparkclusters usingSparkSQL and Data Frames API.
- Setting up the cascading environment and troubleshooting the environmental issues related to cascading.
- Assisted in creating and maintaining Technical documentation to launchingHADOOPClusters and even for executing Hive queries and Pig Scripts
Environment: MapReduce, HDFS Sqoop, Cascading, LINUX, Shell,Hadoop, Spark, Hive, AWS RedShift,HadoopCluster
Confidential, Sunnyvale, CA
Hadoop Developer
Responsibilities:
- Involved in start to end process ofhadoopcluster installation, configuration and monitoring.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on theHadoopcluster.
- Participate in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Created HBase tables to store variable data formats of data coming from different applications.
- DevelopedSparkjobs using Scala on top of Yarn/MRv2 for interactive and Batch Analysis.
- Experienced with the Scala,Sparkimproving the performance and optimization of the existing algorithms in Hadoop usingSparkContext,Spark - SQL, Pair RDD's,SparkYARN.
- Good understanding on DAG cycle for entireSparkapplication flow onSparkapplication WebUI.
- Involved in transforming data from legacy tables to HDFS and HBASE tables using Sqoop.
- Responsible for building scalable distributed data solutions usingHadoop.
- Developed Simple to complex Map/reduce Jobs using Hive and Pig.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS.
- Analyzed the data by performing Hive queries and running Pig scripts to study behavior of lab equipment.
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Continuous monitoring and managing theHadoopcluster using Cloudera Manager.
- Worked on Oozie workflow engine to run multiple Hive and Pig jobs.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: HadoopYarn architecture, MapReduce, HDFS, Hive, Pig, Java, SQL, Cloudera Manager, Sqoop, Flume, Oozie, Java (jdk 1.6), Eclipse, Linux, NoSql.
Confidential, New York, NY
Java/Hadoop Developer
Responsibilities:
- Worked onHadoopcluster which ranged from 4 - 8 nodes during pre-production stage and it was sometimes extended up to 24 nodes during production
- Used Sqoop to import the data from RDBMS toHadoopDistributed File System (HDFS) and later analyzed the imported data usingHadoopComponents
- Established custom MapReduces programs in order to analyze data and used Pig Latin to clean unwanted data
- Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
- Involved in creating Hive tables, then applied HiveQL on those tables for data validation.
- Moved the data from Hive tables into Mongo collections.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts
- Participated in requirement gathering form the Experts and Business Partners and converting the requirements into technical specifications
- Used Zookeeper to manage coordination among the clusters
- Created and maintained Technical documentation for launchingHADOOPClusters and for executing Hive queries and Pig Scripts.
- Worked on Cloudera to analyze data present on top of HDFS.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs which run independently with time and data availability
- Assisted application teams in installingHadoopupdates, operating system, patches and version upgarades when required
- Assisted in Cluster maintenance, Cluster Monitoring and Troubleshooting, Manage and review data backups and log files.
Environment: Hadoop, Pig, Hive, Sqoop, Cloudera Manager (CDH3), Flume, MapReduce, HDFS, JavaScript, Websphere, HTML, AngularJS, LINUX, Oozie, MongoDB.
Confidential, Raleigh, NC
Java Developer
Responsibilities:
- Work with business users to determine requirements and technical solutions.
- Followed Agile methodology (Scrum Standups, Sprint Planning, Sprint Review, Sprint Showcase and Sprint Retrospective meetings).
- Developed business components using core java concepts and classes like Inheritance, Polymorphism, Collections, Serialization and Multithreading etc.
- Used SPRING framework that handles application logic and makes calls to business make them as Spring Beans.
- Implemented, configured data sources, session factory and used Hibernate Template to integrate Spring with Hibernate.
- Developed web services to allow communication between applications through SOAP over HTTP with JMS and mule ESB.
- Actively involved in coding using Core Java and collection API's such as Lists, Sets and Maps
- Developed a Web Service (SOAP, WSDL) that is shared between front end and cable bill review system.
- Implemented Rest based web service using JAX - RS annotations, Jersey implementation for data retrieval with JSON.
- Developed MAVEN scripts to build and deploy the application onto Web logic Application Server and ran UNIX shell scripts and implemented auto deployment process.
- Used Maven as the build tool and is scheduled/triggered by Jenkins (build tool).
- Develop JUNIT test cases for application unit testing.
- Implement Hibernate for data persistence and management.
- Used SOAP UI tool for testing web services connectivity.
- Used SVN as version control to check in the code, Created branches and tagged the code in SVN.
- Used RESTFUL Services to interact with the Client by providing the RESTFUL URL mapping.
- Used Log4j framework to log/track application and debugging.
Environment: JDK 1.6, Eclipse IDE, Core Java, J2EE, Spring, Hibernate, Unix, Web Services, SOAP UI, Maven, Web logic Application Server, SQLDeveloper, Camel, Junit, SVN, Agile, SONAR, Log4j, REST, Log 4j, JSON, JBPM.