We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

2.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY

  • Around 8 years of IT Experience in designing, developing, implementing and supporting enterprise level applications in Hadoop framework and Java
  • Hands - on experience on major components inHadoop Ecosystem including PIG, Hive, HBase, Sqoop, Flume & knowledge of Mapper/Reduce/HDFS Framework, Oozie, Impala.
  • Experience in installation, configuration Hadoop Clusters using Apache, Cloudera distributions.
  • Good experience in rapid analysis using PIG and HIVE.
  • Good knowledge on Apache Hadoop Architecture and underlying concepts and workload Management, schedulers and scalability.
  • Good Programming experience with SQL, PL/SQL Database technologies and its relational databases including Oracle, Teradata, MY-SQL.
  • Experienced with NoSQL databases - HBase, MongoDB and Cassandra.
  • Experience on Data Warehousing ETL (Extraction, Transformation and Loading).
  • Experience in writing MapReduce programs using Java, Python to perform data processing and analysis.
  • Hands on experience in data processing automation using Python.
  • Configured Spark Streaming to receive real time data from the Kafka and store the stream data to HDFS
  • Experienced in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like ApacheSpark written in Scala.
  • Hands on experience in import/export of data using data management tool Sqoop.
  • Good Knowledge in streaming the data to HDFS using Flume.
  • Hands on experience in Apache Hadoop Administration, Linux Administration.
  • Good understanding of HDFS Designs, Daemons, federation and HDFS high availability.
  • Good knowledge on Hadoop MRV1 and MRV2 (YARN) Architecture.
  • Knowledge on UNIX and shell scripting.
  • Good understanding/Knowledge on Hadoop Architecture and various daemons such as Jobtracker, TaskTracker, NameNode, DataNode and Secondary NameNode programming Paradigm.
  • Familiar with Object-Oriented Programming(OOP), Object Oriented analysis(OOA), Object Oriented Design(OOD) concepts
  • Experience in installation, configuration, supporting and managing- CloudEra's Hadoop in AWS.
  • Good understanding in data structures, design analysis of algorithms.
  • Knowledge in integration of various data sources like RDBMS, Spreadsheets, Text files and XML files.
  • Good knowledge in Software development Life Cycle(SDLC)
  • Worked on different strategies like Agile, Waterfall and Scrum Methodologies.
  • Extensive experience and domain expertise in Banking and Healthcare.

TECHNICAL SKILLS

Hadoop Ecosystem: Hive, Pig, HBASE, Sqoop, Flume, Oozie, Zookeeper, Spark, Scala and Impala.

Hadoop Core: HDFS, MapReduce, Yarn.

Hadoop Cluster: Apache Hadoop, Cloudera, Hortonworks.

Programming Languages: C, OOPS, Java, SQL, PL/SQL, Shell Scripting, Python.

Framework Tools: Spring, Hibernate, Struts.

Databases: Oracle 8i/9i/10g, MYSQL, Teradata.

NoSql: HBase, MongoDB, Cassandra.

Web Technologies: HTML, CSS, Servlets, JSP, JavaScript.

IDE Tools: Eclipse 3.3, Net Beans, Maven, Apache ANT

Operating Systems: Windows, Linux/Unix.

Web Servers: Web Sphere, Apache Tomcat, BEA WebLogic, Jboss.

Methodologies: Waterfall, SDLC, Agile

Certifications: CCNA, Oracle 9i

PROFESSIONAL EXPERIENCE

Confidential, Chicago, IL

Hadoop/Spark Developer

Responsibilities:

  • Used Oozie workflow engine for managing interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java Map-reduce, Hive and Sqoop as well as System specific jobs.
  • Used Sqoop to efficiently transfer data between databases and HDFS and used Flume to stream the log data from servers.
  • Developed Map Reduce programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
  • Imported data from various data sources, performed transformations using Hive, MapReduce.
  • Responsible for loading data into HDFS, extracted the processed data from MySQL into HDFS using Sqoop.
  • UsedSpark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Developed Spark code using Scala and Spark -SQL/Streaming for faster testing and processing of data.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities ofSpark using Scala.
  • Responsible for GeneratingScalaand java classes from the respective APIs so that they can be incorporated in the overall application.
  • Implementeddesign patternsinScalafor the application.
  • WrittenScalaclasses to interact with the database.
  • Experience on Working withback-endusingScalaand Spark to perform several aggregationlogic's.
  • Responsible for theImplementationofPOC to migrate map reduce jobs into Spark RDD transformations usingScala.
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFSusingScala
  • Import the data from different sources like HDFS/HBase intoSpark RDD.
  • Load the data into Spark RDD and do in memory data Computation to generate the Output response.
  • Developed complex MapReduce programs in Java for Data Analysis on different data formats.
  • Experience on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
  • Implemented static and dynamic partitioning in Hive.
  • Extensively Used Sqoop to import/export data between RDBMS and Hive tables, incremental imports and created Sqoop jobs for last saved value.
  • Extensively involved in ETL code using Informatica tool in order to meet requirements for extract, cleansing, transformation and loading of data from source to target data structures.
  • Did performance tune of Informatica components for daily and monthly incremental loading.
  • Created Hive queries to compare the raw data with EDW reference tables and performing aggregates.
  • Managing and scheduling jobs on a Hadoop cluster.
  • Used Pig as ETL tool for various data joins.
  • Developed Simple and complex MapReduce Jobs using Hive.
  • Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Implemented Partitioning and bucketing in Hive.
  • Experienced in managing and reviewing Hadoop LogFiles.
  • Expertise in using ORC and Parquet file formats in Hive.
  • Created customized BI tool for manager team that perform Query analytics using HiveQL.
  • Configured Flume to extract the data from the web server output files to load into HDFS.
  • Developed the Pig UDF'S to pre-process the data for analysis.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Involved in story-driven Agile development methodology and actively participated in daily scrum meetings.

Environment: CloudEraHadoop, MapReduce, HDFS, Hive, Java (jdk1.7), Pig, Linux, XML, HBase, Zookeeper, Sqoop, Scala, Amazon Web Services (AWS), Python.

Confidential, Detroit, MI

Hadoop Developer

Responsibilities:

  • Designed and implemented MapReduce jobs to support distributed data processing to process large data sets utilizing Hadoop cluster environment that which needed by business use cases.
  • Handled importing of data from various data sources, performed transformations using PIG, Map Reduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using SQOOP.
  • Experienced in the data analysis, design, development and MRUnit testing of Hadoop Cluster Structureusing Java.
  • Developed Scala and SQL code to extract data from various databases
  • Worked on Scala Play framework for application development.
  • Worked on AWS to create EC2 instance and installed Java, Zookeeper and Kafka on those instances.
  • Worked on S3 buckets on AWS to store Cloud Formation Templates
  • Managed nodes on Hadoop cluster connectivity and security.
  • Implemented Hive Generic UDF's to implement business logic.
  • Implemented six nodes CDH4 Hadoop Cluster on CentOS.
  • Used Partitioning pattern in Map Reduce to move records into categories
  • Installed and configured Hive and also written Hive QL scripts.
  • Experienced with Accessing Hive tables to perform analytics from java applications using JDBC.
  • Experienced in running batch processes using Pig Scripts and developed Pig UDFs for data manipulation according to Business Requirements
  • Written advanced MapReduce codes for Joins and Grouping.
  • Created Hive external tables on the MapReduce output before partitioning; bucketing is applied on top of it.
  • Performed unit testing of MapReduce jobs on cluster using MRUnit.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Unix Scripting to manage the Hadoop Operation stuffs.
  • Written Puppet program for installation and configuration of Cloudera Hadoop CDH3u1.
  • Participated in development/implementation of Cloudera Hadoop environment.
  • Responsible for Developing MR jobs using python, performed unit testing using MRUnit.
  • Involved in story-driven Agile development methodology and actively participated in daily Scrum meetings.

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Java, Spark, Cloudera Distribution, Cassandra, HBase, Java, HTML, JavaScript, XML, XSLT, JQuery, AJAX, Web Services, Agile

Confidential, Dallas, TX

Hadoop/Java Developer

Responsibilities:

  • Worked on analysing Hadoop cluster using different BigData analytic tools including Pig, Hive and Sqoop.
  • Moving the data to the BigData warehouse HBase using Sqoop.
  • Experienced in managing and reviewing Hadoop LogFiles.
  • Implemented Fair schedulers on the Job tracker to share the resources to the Cluster for the MapReduce jobs given by the users.
  • Collected and aggregated large amounts of web log data from different sources such as web servers, mobile and network devices using Apache Flume and stored the data into HDFS for analysis.
  • Worked on the Ingestion of Files into HDFS from remote systems using MFT (Managed File Transfer).
  • Analyzed the web log data using the HiveQL, integrated Oozie with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box such as Map-Reduce, Pig, Hive, and Sqoop, as well as system specific jobs. Designed and implemented MapReduce based large-scale parallel processing.
  • Developed user interface using JSP, JSP Tag libraries, and Java Script to simplify the Complexities of the application.
  • Implemented Model View Controller architecture using Jakarta Struts
  • Implemented SOA architecture with web services using JAX-RS (REST) and JAX-WS.
  • Designed and developed data management system using MySQL.
  • Prepared technical specifications against functional requirements.
  • Developed and updated the web tier modules using Struts 2.1 Framework.
  • Modified the existing JSP pages using JSTL.
  • Implemented Struts Validator for automated validation.
  • Involved in writing Spring Configuration XML files that contains declarations and other dependent objects declaration.
  • Responsible for the utilization of Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQLserver.
  • Data integrity / quality testing. Custom table creation and population, custom and package index analysis and maintenance in relation to process performance.
  • Used CVS for versioncontrolling and JUnit for unit testing.
  • Developed user and technical documentation.
  • Involved in story-driven Agile development methodology and actively participated in daily Scrum meetings.

Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Oozie, MySQL, Cassandra, Java, Shell, Scripting, MySQL, SQL, Java, J2EE, JSP, STRUTS, JDK, JSF, JAX-RS, JAX-WS, JavaScript, Hibernate, JUnit.

Confidential

JAVA/J2EE Developer

Responsibilities:

  • Responsible and active in the analysis, design, implementation and deployment of full Software Development Lifecycle (SDLC) of the project.
  • Designed and developed user interface using JSP, HTML and JavaScript.
  • Developed Struts action classes, action forms and performed action mapping using Struts framework and performed data validation in form beans and action classes.
  • Extensively used Struts framework as the controller to handle subsequent client requests and invoke the model based upon user requests.
  • Defined the search criteria and pulled out the record of the customer from the database. Make the required changes and save the updated record back to the database.
  • Validated the fields of user registration screen and login screen by writing JavaScript validations.
  • Developed build and deployment scripts using Apache ANT to customize WAR and EAR files.
  • Used DAO and JDBC for database access.
  • Developed stored procedures and triggers using PL/SQL in order to calculate and update the tables to implement business logic.
  • Design and develop XML processing components for dynamic menus on the application.
  • Involved in postproduction support and maintenance of the application.
  • Involved in the analysis, design, implementation, and testing of the project.
  • Implemented the presentation layer with HTML, XHTML and JavaScript.
  • Developed web components using JSP, Servlets and JDBC.
  • Implemented database using SQL Server.
  • Designed tables and indexes.
  • Wrote complex SQL and stored procedures.
  • Involved in fixing bugs and unit testing with test cases using JUnit.
  • Developed user and technical documentation.

Environment: Java, J2EE, Web Logic 9.2, Oracle 10g, SQL Server, JSP, STRUTS, JDK, JavaScript, HTML, CSS, IBM RAD 7.0,AJAX, JSTL, ANT1.7 build tool, JUnit, Spring, Web Services.

Confidential

JAVA/J2EE Developer

Responsibilities:

  • Involved in developing JSP Pages & Servlet Classes.
  • Developed business layer components in java.
  • Used Java Script for client side validations.
  • Used HTML, AWT with Java Applets to create web pages.
  • Responsible for requirement gathering, requirement analysis, defining scope, and design.
  • Developed the Use Cases, Class Diagrams and Sequence Diagrams using Rational Rose.
  • Developed user Interface using JSP and HTML.
  • Deployed the application on WebSphere Application server.
  • Written Server Side programs using Servlets.
  • Used Eclipse IDE for all coding in Java, Servlets and JSPs.
  • Involved in designing the screens and client/Server side validations using JavaScript and Validation frame work.
  • Used JDBC connections to store and retrieve data from the database.
  • Experience in analyzing business rules to prepare test cases.
  • Review the test cases and provided comments
  • Experience in GUI, System, System Integration and Regression Testing.
  • Experience in Designing, Review and Execution of test cases.
  • Used Flex Styles and CSS to manage the Look and Feel of the application.
  • Prepared design documents for code developed and defect tracker maintenance.
  • Involved in unit testing and bug fixing.

Environment: Java, J2EE (JSPs & Servlets), JUnit, HTML, CSS, JavaScript, MySQL

We'd love your feedback!