We provide IT Staff Augmentation Services!

Spark/hadoop Developer Resume

4.00/5 (Submit Your Rating)

CA

SUMMARY

  • Around 8+years of IT experience in software development and support with experience in developing strategic methods for deploying Big Data technologies to efficiently solve Big Data processing requirement.
  • Experience with distributed systems, large - scale non-relational data stores, RDBMS, NoSQL map-reduce systems, data modelling, database performance, and multi-terabyte data warehouses.
  • Working experience in Hadoop framework, Hadoop Distributed File System and Parallel Processing implementation.
  • Hands-on experience with the overall Hadoop eco-system - HDFS, Map Reduce, Pig/Hive, HBase.
  • Working experience with large scale Hadoop environments build and support including design, configuration, installation, performance tuning and monitoring.
  • Excellent understanding / knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Experience in writing custom UDFs in java for Hive and Pig.
  • Experience in writing custom partition and countersMap Reduce programs in java.
  • Experience in installation, configuration and management of development, testing and production Hadoop Cluster.
  • Performed Importing and exporting data into HDFS and Hive using Sqoop.
  • Experience in working with Flume to load the log data from multiple sources directly into HDFS.
  • Experience in designing both time driven and data driven automated workflows using Oozie.
  • Experience in loading log data into HDFS using Flume.
  • Experience in spark environment.
  • Experience in Scala programming.
  • Experience in writing shell scripts.
  • Experience working with JAVA, J2EE, JDBC, ODBC, JSP, Java Eclipse, Java Beans, EJB, Servlets, MSSQL, React.js Server.
  • Hands on experience in application development using the technologies Java, RDBMS, Linux/Unix shell scripting and Linux internals.
  • Developed custom directives in Angular JS that could be re-used like a template across the application and can be re-used to manipulate DOM across the application.
  • Experience in using IDEs like Eclipse and Net Beans.
  • Experience working with spring.
  • Development experience in Oracle.
  • Expertise level with Informatica and possess a solid working knowledge of SQL and ETL load and extract processes within Informatica.
  • Experience in developing very complex mappings, reusable transformations, sessions and workflows usingInformaticaETLtool to extract data from various sources and load into targets.
  • Experience with Client Side designing and validations using HTML, DHTML and Java Script.
  • Experience in User Interface Designing using HTML, DHTML CSS, JavaScript and Photoshop.
  • Knowledge on various NoSQL databases like HBase. Cassandra, Accumlo and Mongo DB.
  • Knowledge on data integration tools like Talend, pentaho integration tools and more
  • Expert in writing SQL queries and database programming using PL/SQL.
  • Quick learning skills and effective team spirit with good communication skills.
  • Strong analytical and Problem-solving skills.

TECHNICAL SKILLS

HadoopTechnologies: Big Data Ecosystem HDFS, HBase, Impala,HadoopMap Reduce, Zookeeper, Hive, Pig, Sqoop, Flume, Oozie, Cassandra, Accumlo, Pentaho, Spark, Impala, AWS.

Programming languages: SQL, PL/SQL, C, C++, python.

Web Technologies: HTML, XML, AJAX, SOAP, ODBC, JDBC, Java Beans, EJB, MVC, JSP, Servlets, Java Mail, Struts, Junit, JavaScript, Angular.JS, React.JS, AJAX, SOAP, DHCPApplication / Web Servers WebLogic 10.3, IBM Web Sphere 7.0, Apache Tomcat, Jboss, SOA Build Tools ANT, Maven, Struts, Springs, Hibernate, JSF.

Frameworks: MVC, Spring, Struts, Hibernate, .NET Data Warehousing and NoSQL Databases HBase.

Databases: Oracle, MS-SQL Server, My SQL

Operating Systems: Unix / Linux, Windows 2000/NT/XPs/7

PROFESSIONAL EXPERIENCE

Spark/Hadoop Developer

Confidential, CA

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Work with business stakeholders to understand requirements / business use cases.
  • Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyse the logs to identify issues and behavioural patterns.
  • Proficient in Installation, Configuration and migrating and upgrading of data from Hadoop Map Reduce, HIVE, HDFS, HBase, Sqoop, Oozie, Scala, Zookeeper, Golden Gate.
  • Experience with leveraging Hadoop ecosystem components including Hive for data analysis, Sqoop, Golden gate for data migration, TWS for scheduling and HBase as a NoSQL data store.
  • Sqoop, Oracle Golden gate tool were created for data ingestion from relational databases.
  • Proficient in implementing HBase and Spark SQL.
  • Expertise in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
  • Implemented Avro data formats for Apache Hive computations to handle custom business requirements.
  • Loading data from large data files into Hive tables.
  • Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
  • Develop data pipelines to consume data from Enterprise Data Lake (Map - R Hadoop distribution - Hive tables/HDFS) for analytics solution
  • Designing and implement product features in collaboration with business and IT stakeholders.
  • Performing all phases of software engineering including requirements analysis, application design, and code development & testing.
  • Working very closely with Architecture group and driving technical solutions.
  • Design and develop innovative solutions to meet the needs of the business and interacts with business partners and key contacts.
  • Design and develop dashboards / reports / data discovery workflows / analytics solution using big data analytics tools such as ServiceNow, Splunk & Tableau.
  • Performance tuning of the hive queries.
  • Implement process improvements (Automation, Performance tuning, Optimize workflows)
  • Design and develop innovative solutions to meet the needs of the business and interacts with
  • Business partners and key contacts.
  • Support the implementation and drive it to stable state in production.
  • Reviewing code and providing feedback relative to best practices, improving performance etc.
  • Troubleshooting production support issues post-deployment and come up with solutions as required.
  • Perform functional and performance testing
  • Deploy big data solutions into production
  • Work with Infrastructure team in deploying solutions into production
  • Implement projects / use case in agile environment
  • Responsible for maintaining & monitoring data quality for Analytical dashboards / reports
  • Contribute to Knowledge repository by documenting technical design considerations, best practicesBusiness workflows, project documentation, troubleshooting guide etc.
  • Provide business user support. Troubleshoot production problems / functional issues with data visualization & discovery tools
  • Collaborate with business and IT partners such as Infrastructure team, Vendor team, internal teams and offshore team
  • Work with offshore team.
  • Guide offs - shore team.

Environment: Map-R v5.2, HDFS, Hive, Sqoop, IBM Tivoli work scheduler, Unix/Linux, Teradata, IBM Info Sphere Data Stage, HBase, Spark, Splunk, Elastic search, ServiceNow, Dart, Tableau.

Hadoop Developer

Confidential, CO

Responsibilities:

  • Responsible for migrating data from diversified data sources into Hadoop MELD platform.
  • Responsible for creating low level designs for code implementations.
  • Created Sqoop jobs for both Historical and Incremental data migration from legacy systems.
  • Developed re-usable SFTP utility for flat files migration into MELD platform.
  • Responsible for production support for the daily running jobs and monitoring them.
  • Responsible for creating Event Handler Classes for events processing.
  • Responsible for creating Hive tables using Partitions, Buckets, UDF's, HQL Scripts in landing layer for analytics.
  • Developed folder watcher utility for continuous data migration to HDFS.
  • Developed UC4 workflows for running sequential job flows in Production.
  • Created mapping documents from legacy systems to Hadoop.
  • Developed file parsers utilities for parsing files as per the requirement.
  • UsedInformaticaas anETLtool to create source/target definitions, mappings and sessions to extract, transform and load data into staging tables from various sources.
  • Responsible for user stories creations, tracking and delivering as per sprint planning.

Environment: Horton works HDP-2.4.3 YARN cluster, HDFS, Hive, Spark SQL, HBase, Sqoop, UC4, and Accumlo, Unix/Linux, Teradata.

Hadoop Developer

Confidential, San Jose, CA

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Wrote multiple Map Reduce programs in Java for Data Analysis
  • Wrote Map Reduce job using Pig Latin and Java API.
  • Performed performance tuning and troubleshooting of Map Reduce jobs by analysing and reviewing Hadoop log files
  • Developed pig scripts for analysing large data sets in the HDFS.
  • Collected the logs from the physical machines and the Open Stack controller and integrated into HDFS using Flume.
  • Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
  • Performed spark queries for data processing.
  • Performed spark shell programs like Scala programming.
  • Performed data sharing using spark RDD (resilient distributed datasets)
  • Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyse the logs to identify issues and behavioural patterns.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Worked onInformaticsPower Centre tool - Source Analyzer, Data Warehousing Designer, Mapping Designer & Mapplets, and Transformations.
  • Extensively usedInformaticaData Quality tool (IDQDeveloper) to create rule based data validations for profiling.
  • Utilized Storm for processing large volume of datasets.
  • Used Kafka to load data in to HDFS and move data into NoSQL databases viz. Cassandra
  • Used OOZIE Operational Services for batch processing and scheduling workflows dynamically.
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
  • Involved in submitting and tracking Map Reduce jobs using Job Tracker.
  • Used python language for scripting purpose.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
  • Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
  • Implemented Hive Generic UDF's to implement business logic.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Map Reduce, python, HDFS, Pig, Hive, Sqoop, Flume, Oozie, Java, Unix/Linux, Teradata, Zookeeper, Tableau, HBase, Cassandra, Kafka, cloudera.

Hadoop Developer

Confidential, Cincinnati, Ohio

Responsibilities:

  • Driving the Data mapping and Data modelling exercise with the stake holders.
  • Developed/captured/documented architectural best practices for building systems on AWS.
  • Used Pig as ETL (Informatica) tool to perform transformations, event joins and pre-aggregations before storing the curated data into HDFS.
  • Launching and setup ofHadooprelated tools on AWS, which includes configuring different components ofHadoop.
  • Involved in submitting and tracking Map Reduce jobs using Job Tracker.
  • Involved in creating Oozie workflow and Coordinator jobs to kick off the jobs on time for data availability.
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations
  • Used Visualization tools such as Power view for excel, Tableau for visualizing and generating reports.
  • Exported data to Tableau and excel with Power view for presentation and refining
  • Implemented business logic by writing Pig UDFs in Java and used various UDFs from Piggybanks and other sources
  • Implemented Hive Generic UDF's to implement business logic.
  • Implemented test scripts to support test driven development and continuous integration.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Developed Pig scripts for analysing large data sets in the HDFS.
  • Collected the logs from the physical machines and the open Stack controller and integrated into HDFS using Flume.
  • Designed and presented plan for POC on Impala.
  • Responsible for building scalable distributed data solutions usingHadoop.
  • Written multiple Map Reduce programs in Java for Data Analysis.
  • Wrote Map Reduce job using Pig Latin and Java API.
  • Proficient in using Cloudera Manager, an end-to-end tool to manageHadoopoperations.
  • Experienced in migrating Hive QL into Impala to minimize query response time.
  • Knowledge on handling Hive queries using Spark SQL that integrates with Spark environment.
  • Implemented Avro and parquet data formats for Apache Hive computations to handle custom business requirements.
  • Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • InstalledHadoop, Map Reduce, HDFS, and AWS developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Implemented Daily Cron jobs that automate parallel tasks of loading the data into HDFS using Oozie coordinator jobs.
  • Responsible for performing extensive data validation using Hive, Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
  • Used Informatica as an ETL tool to extract data from source systems to target systems.
  • Trained and mentored analyst and test team onHadoopframework, HDFS, Map Reduce concepts, HadoopEcosystem.
  • Continuous monitoring and managing theHadoopcluster using Cloudera Manager.
  • Involved in loading data from Teradata database into HDFS using Sqoop queries.

Environment: ApacheHadoop, Agile, Map Reduce, HDFS, Azure Pig, Hive, Sqoop, Flume, Oozie, Scala, Java, RDBMS, Linux, ETL, Maven, AWS, Teradata, Zookeeper, Tableau.

Hadoop Developer

Confidential, Richfield, MN

Responsibilities:

  • Implemented Web service calls for Different integrations.
  • Implemented of MDM solution using Hadoop and Cassandra.
  • Worked as Developer for Big data, Java/J2ee solutions.
  • Developed application using JAVA, J2EE, JSP, spring.
  • Involved in requirement and design activates.
  • Involved in jobs scheduling using Maestro scheduler.
  • Involved in system, manual testing while integrating with different data integration projects.
  • Involved in build and deployment activities.
  • Involved in Bug fixing activities as part of CR's from customer.
  • Written hive quires and shell scripts for data integration. Worked as a member of the
  • Big Data team for deliverables like design, construction, unit testing and deployment.
  • Creating Hive tables.
  • Loading data from large data files into Hive tables.
  • Involved in writing shell scripts for executing hive queries.
  • Involved in gathering requirement and design.
  • Initial setup to receive data from external source.
  • Designed and developed Hive job to merge incremental file.
  • Analysis and design on production views.
  • Involved in writing Map/Reduce jobs using java.
  • Involved in writing various user defined functions as per the requirements.
  • Translation of functional and technical requirements into detailed architecture and design
  • Responsible to manage data coming from different sources.
  • Experienced in analysing data with Hive.
  • Responsible for operational support of Production system

Environment: Java, J2EE, JSP, spring, Hibernate, Hadoop, Hive, Linux, Cassandra UNIX, Solaris, Tomcat6, log4j, Eclipse, SVN.

Java/J2EEDeveloper

Confidential

Responsibilities:

  • Gathered and analysed user/business requirements and developed System test plans.
  • Managed the project using Test Director, added test categories and test details.
  • Involved in using various PeopleSoft Modules.
  • Performed execution of test cases manually to verify the expected results.
  • Created Recovery Scenarios for the application exception handling using recovery manager.
  • Implemented cross cutting concerns as aspects at Service layer using Spring AOP.
  • Involved in the implementation of DAO objects using spring - ORM.
  • Involved in the JMS Connection Pool and the implementation of publish and subscribe using Spring JMS.
  • Implemented various screens for the front end using React.js and used various predefined components from NPM (Node Package Manager) library.
  • Developed Dynamic Single Page Application using React JS and Angular2
  • Used JMS Template to publish and Message Driven POJO (MDP) to subscribe from the JMS provider.
  • Developed custom directives in Angular JS that could be re-used like a template across the application.
  • Extensively used Angular JS by consuming Restful web services.
  • Implemented XML Schema as part of XQuery Query Language
  • Designed dynamic and browser compatible pages using HTML5, CSS3, Bootstrap, jQuery and JavaScript and Angular.js, KnockOut.js
  • Involved in creating the Hibernate POJO's and developed Hibernate Mapping Files.
  • Used Hibernate, object/relational-mapping (ORM) solution, technique of mapping data
  • Involved in doing the GAP Analysis of the Use cases and Requirements.
  • Test Scenarios developed for Test Automation.

Environment: Windows 98, Java 1.4, C, C++, JSP, Angular.js, React.js, Servlets, J2EE, PHP, Multi-threading, OO design, JDBC, HTML, RAD, WSAD.

Java/J2EE Developer

Confidential

Responsibilities:

  • Developed the web tier using JSP, Struts MVC to show account details and summary.
  • Used Struts Tiles Framework in the presentation tier.
  • Designed and developed the UI using Struts view component, JSP, HTML, CSS and JavaScript.
  • Used AJAX for asynchronous communication with server
  • Utilized Hibernate for Object/Relational Mapping purposes for transparent persistence onto the SQL Server database.
  • Developed ETL mapping testing, correction and enhancement and resolved data integrity issues.
  • Involved in writing Spring Configuration XML files that contains declarations and another dependent objects declaration.
  • Involved in developing XNL compilers using XQuery.
  • Used Tomcat web server for development purpose.
  • Involved in creation running of Test Cases for JUnit Testing.
  • Used Oracle as Database and used Toad for queries execution and involved in writing SQL scripts, PL/SQL code for procedures and functions.
  • Used CVS for version controlling.
  • Developed application using Eclipse and used build and deploy tool as Maven.

Environment: Java, J2EE Servlet, JSP, JUnit, AJAX, XML, JSON, CSS, JavaScript, Spring, Struts, Hibernate, Eclipse, Apache Tomcat, and Oracle.

We'd love your feedback!