We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

2.00/5 (Submit Your Rating)

Beaverton, OR

PROFESSIONAL SUMMARY:

  • Having around 8 years of professional IT experience which includes 4 years of experience in big data ecosystem related technologies.
  • Having 4 years of hand - on experience with the Hadoop ecosystem - Map Reduce, HDFS, Hive, Pig, Hbase, Spark, Apache Kafka, Sqoop, Oozie and Cloudera.
  • Having 5+ years of IT experience in software design and development of Java/J2EE applications with strong Object Oriented Programming skills and in SQL.
  • Excellent in Hadoop (CDH3/CDH4 & Hortonworks) architecture, Map Reduce programming using Hive, Java.
  • Extensive experience in creating UDF’s and making use of them in Hive.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Using Hadoop ecosystem components for storage and processing data, exported data into Tableau using live connection.
  • Strong knowledge of Hadoop and Hive’s analytical functions.
  • Good Working knowledge of Build scripts & automated solutions using various scripting languages, like Shell and Python.
  • Knowledge on Cloud technologies like AWS Cloud and Amazon Elastic Map Reduce (EMR).
  • Ensure data integrity and data security on AWS technology by implementing AWS best practices.
  • Ability to identify and gather requirements to define a solution to be build and operated on AWS.
  • Having experienced in Agile Methodologies, Scrum stories and sprints experience in a Python based environment, along with data analytics, data wrangling and Excel data extracts.
  • Experienced in cloud automation using AWS cloud Formation Templates, Chef, Puppet.
  • Having experience on using OOZIE to define and schedule the jobs.
  • Having experience on Storage and Processing in Hue covering all Hadoop ecosystem components.
  • Expertise in Shading distributed Cassandra and MongoDB system. Experience in building Cassandra cluster. Monitoring Cassandra cluster for resource utilization.
  • Managing Cassandra clusters using Datastax opscenter. Knowledge of Cassandra systems backup and recovery. Knowledge of Cassandra security.
  • Experience in Kafka offset handling, load balancing, message ordering, performance handling and Topic maintenance to make data highly available for downstream processes.
  • Worked on Configuring Zookeeper, Kafka cluster.
  • Experience in Kafka offset handling, load balancing, message ordering, performance handling and Topic maintenance to make data highly available for downstream processes.
  • Load and transform large sets of structured, semi-structured and unstructured data using Hadoop ecosystem components.
  • Administer and maintain Pivotal/Hortonworks Hadoop, Greenplum and GemFire clusters across all environments.
  • Managing and scheduling Jobs on Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions.
  • Hands on experience using Cloudera and Hortonworks Hadoop Distributions.
  • Good knowledge on AWS infrastructure services Amazon simple storage service (Amazon S3) and Amazon elastic compute cloud (Amazon c2).
  • Working knowledge in multi-tiered distributed environment, OOAD concepts, good understanding of Software Development Lifecycle (SDLC).
  • Proficient in Web application development using JAVA, J2EE (JSP, Servlets, XML, JDBC), JavaScript, HTML, Web service, Oracle, DB2.
  • Excellent working knowledge of popular frameworks like Struts, Hibernate, and Spring MVC.
  • Experience in working with web development technologies such as HTML, CSS, and JavaScript.
  • Hands on experience working with Web Services such as Restful services.
  • Experience in Agile engineering practices.
  • Comfortable with J2EE compliant IDEs like Eclipse.
  • Knowledge of Ant and Maven scripts to build and deploy Java applications.

TECHNICAL SKILLS:

Big Data Ecosystems: HDFS, Sqoop, Flume, Hive, Pig, Map Reduce YARN, Oozie, Kafka, Spark and HBASE.

Big Data Platforms: Hortonworks, Cloudera and AWS.

Databases: NoSQL, Oracle and MySQL.

Languages: SQL, Pig Latin, HiveQL, Unix, Java and Scala.

Operating Systems: Linux, Windows.

Development Methodologies: Agile, Waterfall.

Web technologies: HTML, XML.

PROFESSIONAL EXPERIENCE:

Confidential - Beaverton, OR

Hadoop/Spark Developer

Responsibilities:

  • Working on Apache Spark with python using SQL/Hive context for click stream data for Nike’s product suites like NRC, NTC, Nike.com.
  • Working on Airflow to schedule the spark jobs.
  • Worked on debugging, performance tuning of Hive Jobs.
  • Migrating tables from RC format to ORC and data induction and other customized file formats.
  • Wrote AutoSys jobs to schedule the reports.
  • Implemented test scripts to support test driven development and continuous integration.
  • Extending Pig and Hive core functionality by writing customized User Defined Functions for analysis of data, file processing, by running PigLatinScripts.
  • Having experience in creating Hive internal/external Tables using shared Meta Store.
  • Written Sqoop Queries to import data into Hadoop from Teradata/SQL Server.
  • Knowledge in Streaming the Data to HDFS using Flume.
  • Worked on importing data into HBase using Hbase Shell.
  • Worked on Hortonworks Data Platform Hadoop distribution for data querying using Hive to store and retrieve data.
  • Continuous monitoring and managing the Hadoop cluster through HDP (Hortonworks Data Platform).
  • Analyzed the volume of the existing batch process and designed the Kafka Topic and partition.
  • Stored data in AWS S3 similar to HDFS. Also performed EMR programs on data stored in S3.
  • Worked on Producer API and created a custom partitioner to publish the data to the Kafka Topic. Worked on POC for streaming data using Kafka and spark streaming.
  • Implemented Kafka Customer with Spark-streaming and Spark SQL using Scala. Validated the Dstream and created generated new Dstream and saved the data in HDFS.
  • Involved in importing the real time data to Hadoop using Kafka and implemented the Oozie job for daily. Involved in developing Hive DDLs to create, alter and drop Hive tables and storm, & Kafka.
  • Involved in data processing using an ETL pipeline orchestrated by AWS Data Pipeline using PIG and Hive on Amazon EMR.
  • Triggering Spark jobs on the AWS - Elastic Map Reduce (EMR) cluster resources and perform fine tuning based on the cluster scalability.
  • Experienced in transferring data from different data sources into HDFS systems using Kafka producers, consumers and Kafka brokers.
  • Interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
  • Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
  • Debug and solve the major issues with Cloudera manager by interacting with the Cloudera team.
  • Used Apache Oozie for scheduling and managing the Hadoop Jobs.
  • Extensive experience with Amazon Web Services (AWS)
  • Developed Python/Django application for Google Analytics aggregation and reporting.
  • Developed and updated social media analytics dashboards on regular basis.
  • Having extensive knowledge on RDBMS such as Oracle, Microsoft SQL Server and MYSQL.
  • Good understanding of Zookeeper for monitoring and managing Hadoop jobs.
  • Good understanding of NoSQL databases such as HBase, Cassandra and MongoDB.
  • Supported Map Reduce Programs running on the cluster and wrote custom Map Reduce Scripts for Data Processing in Java.
  • Monitored workload, job performance and capacity planning using Cloudera Manager
  • Experience with operating systems: Linux, Red Hat, and UNIX.
  • Experience in Developing Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.

Environment: HDFS, Pig, Hive, Map Reduce, Spark, Sqoop, HBase, Apace Kafka 0.0.8/0.0.9, 0.1.0, Zookeeper, NoSQL and Linux.

Confidential, Wheeling, IL

Hadoop Developer

Responsibilities:

  • Worked on loading disparate data sets coming from different sources to BDpaas (HADOOP) environment using SQOOP.
  • Developed UNIX scripts in creating Batch load and driver code for bringing huge amount of data from Relational databases to BIGDATA platform.
  • Developed Pig queries to load data to HBase. Leveraged Hive queries to create ORC tables.
  • Worked with a team of developers on Python applications for RISK management.
  • Used EMR (Elastic Map Reducing) to perform big data operations in AWS. Created ORC tables to improve the performance for the reporting purposes.
  • Involved in the coding and integration of several business critical modules of CARE application using Java, spring, Hibernate and REST web services on Web Sphere application server.
  • Knowledge of Cassandra maintenance and tuning - both database and server. Databases Cassandra, MangoDB, MySQL, Oracle.
  • Coordinated Kafka operational and monitoring (via JMX) with dev ops personnel; formulated balancing.
  • Leadership strategies and impact of producer and consumer message (topic) consumption to prevent overruns. Aggressive monitoring of partitioning versus topic production via JMX interface (s) developed Kafka standalone.
  • POC's with the Confluent Schema Registry, Rest Proxy, Kafka Connectors for Cassandra and HDFS (Hadoop 2.0).
  • Developed efficient MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform data cleaning and preprocessing on Hortonworks.
  • Very good experience in monitoring and managing the Hadoop cluster using Hortonworks.
  • Responsible for migrating the code base from Cloudera Platform to Amazon EMR and evaluated Amazon eco systems components like RedShift, Dynamo DB.
  • Developed MapReduce/ EMR jobs to analyze the data and provide heuristics and reports. The heuristics were used for improving campaign targeting and efficiency.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Custom Kafka broker design to reduce message retention from default 7 day retention to 30 minute retention - architected a light weight Kafka broker
  • Developed and tested many features for dashboard using Python, Bootstrap, CSS, and JavaScript.
  • Implemented business logic using Python/Django. Created Health Allies Eligibility and Health Allies Transactional feeds extracts using Hive, HBase and UNIX to migrate feed generation from a mainframe application called CES (Consolidated Eligibility Systems) to big data.
  • Used bucketing concepts in Hive to improve performance of HQL queries. Developed Spark scripts by using Scala shell commands.
  • Created a Map Reduce program which looks into data in HBase current and prior versions to identify transactional updates. These updates are loaded into Hive external tables which are in turn referred by Hive scripts in transactional feeds generation.

Environment: HDFS, Pig, Hive, Map Reduce, Spark, Sqoop, HBase, Apace Kafka 0.0.8/0.0.9, 0.1.0, Zookeeper, NoSQL, and LINUX.

Confidential - Nashville, TN

Java Developer

Responsibilities:

  • Developed all the User Interfaces using JSP and Struts framework.
  • Writing Client Side validations using JavaScript.
  • Extensively used JQuery for developing interactive web pages.
  • Developed the DAO layer using the hibernate and for real time performance used the caching system for hibernate.
  • Experience in developing web services for production systems using SOAP and WSDL.
  • Developed the user interface presentation screens using HTML, XML and CSS.
  • Experience in working with spring using AOP, IOC and JDBC template.
  • Developed the Shell scripts to trigger the Java Batch job, Sending summary email for the batch job status and processing summary.
  • Co-ordinate with the QA lead for development of test plan, test cases, test code and actual testing responsible for defects allocation and those defects are resolved.
  • Involved in testing and deployment of the application on Web logic Application Server during integration and QA testing phase.
  • Maintained the existing code base developed in the Struts, spring and Hibernate framework by incorporating new features and doing bug fixes.
  • Used ANT tool to build and deploy applications.
  • Involved in configuring web.xml and struts.xml for workflow.
  • Wrote SQL queries and created DDL scripts for interacting with the Oracle database.
  • Documentation of common problems prior to go-live and while actively in a Production Support role.

Environment: JDK, JSP, Servlets, Hibernate, Web logic, AJAX, Web Services, XML, ANT, DB2, JUnit.

Confidential

JAVA Developer

Responsibilities:

  • Developed the presentation tier with the Struts framework (MVC 2 Model) consisting of Action Classes and other related configuration settings.
  • Proactively involved in the development of Java Server Pages (JSP) and Java classes.
  • Designed and generated the reports using Jasper reports.
  • Wrote the Data Access Classes (DAO Classes) to access the database and to in corporate CRUD operations.
  • Actively participated in the end to end life cycle of the project.
  • Analysis of the specifications provided by the functional team.
  • Design and Development using Oracle, Servlets, XML and XSL.
  • Was responsible for leading the support team.

Environment: J2EE, MySQL 5.5, Struts 2.X, Jasper Reports, Tomcat 7.0, Eclipse Juno, JDeveloper, Servlets, XML, XSL, Oracle 10g.

Confidential

SQL Developer

Responsibilities:

  • Analyzed Business requirements based on the Business Requirement Specification document.
  • Performed extensive query analysis and tuning, indexes and hints and written numerous complex queries involving sub-queries, correlated queries, union/all, minus, inline SQL’s, analytical function SQL’s.
  • Developed program specifications for PL/SQL Procedures and Functions to do the data migration and conversion.
  • Created wide range of data types, tables, and index types and scoped variables.
  • Designed the front end interface for the users, using Oracle Forms.
  • Involved in database development by creating Oracle PL/SQL Functions, Procedures, Triggers, Packages, Records and Collections.
  • Involved in development of ETL process using SQL* Loader and PL/SQL Package.
  • Developed and customized Forms/Reports Using Oracle D2K.
  • Designed Data layouts and Developer Reports using Oracle D2K.
  • Implemented batch jobs (shell scripts) for loading database tables from Flat Files using SQL*Loader.
  • Participated in Performance Tuning using Explain Plan.
  • Created numerous of database Triggers using PL/SQL.
  • Involved in Technical Documentation, Unit test, Integration Test, writing the Test plan and version controlling with CVS.
  • Created UNIX shell and Perl scripts for data file handling and manipulations.

Environment: Oracle 9i/10g, SQL, PL/SQL, SQL*Plus, Oracle D2K, SQL*Loader.

We'd love your feedback!