We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

4.00/5 (Submit Your Rating)

PROFESSIONAL SUMMARY:

  • Professional Software developer with 8+ years of technical expertise in all phases of Software development cycle (SDLC), in various Industrial sectors like Banking, Financial, Auto Insurance, Health Care expertizing in Bigdata analyzing Frame works and Java/J2EE technologies
  • 4+ years of in depth experience in Big Data analytics, Data manipulation, using Hadoop Eco system tools Map - Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, HBase, Spark, Kafka, Flume, Oozie, Avro, Sqoop, AWS, Spark integration with Cassandra, Avro and Zookeeper.
  • Excellent understanding and extensive knowledge ofHadooparchitecture and various ecosystem Components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode and MapReduce programming paradigm.
  • Good usage of Apache Hadoopalong enterprise version of Cloudera and Hortonworks. Good Knowledge on MAPR distribution & Amazon’s EMR.
  • Hands on expertise in working and designing of Row keys & Schema Design with NOSQL databases like Mongo DB 3.0.1, HBase, Cassandra and DynamoDB (AWS).
  • Extensively worked on Spark Streaming and Apache Kafka to fetch live stream data.
  • Experience in converting Hive/SQL queries into RDD transformations using Apache Spark, Scala and Python.
  • Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark Streaming and Spark SQL.
  • Working knowledge of Amazon’s Elastic Cloud Compute(EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Expertise in developing Pig Latin scripts and using Hive Query Language.
  • Created User Defined Functions (UDFs), User Defined Aggregated Functions (UDAFs) in PIG and Hive.
  • Created Hive tables to store structured data into HDFS and processed it using HiveQL.
  • Experience in validating and cleansing the data using Pig statements and hands-on experience in developing Pig Macros.
  • Working knowledge in installing and maintaining Cassandra by configuring the cassandra.yaml file as per the business requirement and performed reads/writes using Java JDBC connectivity.
  • Experience in writing Complex SQL queries, PL/SQL, Views, Stored procedure, triggers, etc.
  • Experience in OLTP and OLAP design, development, testing and support of enterprise Data warehouses.
  • Written multiple MapReduce Jobs using Java API, Pig and Hive for data extraction, transformation and aggregation from multiple file formats including Parquet, Avro, XML, JSON, CSV, ORCFILE and different compression codecs (GZIP, SNAPPY, LZO)
  • Valuable experience on practical implementation of cloud-specific AWS technologies including IAM, Amazon Cloud Services like Elastic Compute Cloud (EC2), Elasti Cache, Simple Storage Services (S3), Cloud Formation, Virtual Private Cloud (VPC), Route 53, Lambda, EBS.
  • Good experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and partitioner’s to deliver the best results for the large datasets.
  • Running of Apache Hadoop, CDH and Map-R distros, dubbed Elastic MapReduce(EMR) on (EC2).
  • Good knowledge on build tools like Maven, Log4j and Ant.
  • Knowledge in installation, configuration, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH3, CDH4) distributions and on Amazon web services (AWS).
  • Experienced in writing Ad Hoc queries using ClouderaImpala, also used Impala analytical functions.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, MapReduce Programming Paradigm, High Availability and YARN architecture.
  • Proficient in developing, deploying and managing the Solr from development to production.
  • Experience in importing data using Sqoop, SFTP from various sources like RDMS, Teradata, Mainframes, Oracle, Netezza to HDFS and performed transformations on it using Hive, Pig and Spark.
  • Used various Project Management services like JIRA for tracking issues, bugs related to code and GitHub for various code reviews and Worked on various version control tools like CVS, GIT, SVN.
  • Expertise in complete Software Development Life Cycle (SDLC) in Waterfall and Agile, Scrum models.
  • Excellent communication skills, interpersonal skills, problem solving skills and very good team player along with a can do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.

TECHNICAL SKILLS:

Bigdata Technologies: HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Storm, Scala, Spark, Apache Kafka, Flume, Solr, Elastic Search, Ambari, Ab Initio.

Database: Oracle 10g/11g, PL/SQL, MySQL, MS SQL Server 2012

SQL Server Tools: Enterprise Manager, SQL Profiler, Query Analyser, SQL Server 2008, SQL Server 2005 Management Studio, DTS, SSIS, SSRS, SSAS

Languages: Java, Python, Scala, SQL, JavaScript and C/C++

AWS Components: S3, EMR, EC2, Lambda, VPC, Route 53, Cloud Watch

Testing: Junit, Selenium Web Driver

Web Design Tools: HTML, DHTML, AJAX, JavaScript, JQuery and CSS, AngularJs and JSON

Development / Build Tools: Eclipse, Jenkins, Git, Ant, Maven, IntelliJ, JUNIT and log4J.

No SQL Databases: Cassandra, MongoDB, HBase and Amazon Dynamodb.

DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle

Operating systems: UNIX, Red Hat LINUX, Mac os and Windows Variants

Testing: Hadoop MRUNIT Testing, Hive Testing, Quality Center (QC)

ETL Tools: Talend, Informatica, Pentaho

PROFESSIONAL EXPERIENCE:

Confidential

Sr. Hadoop/Spark Developer

Responsibilities:

  • Developed Spark Applications by using Scala, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
  • Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
  • Consumed XML messages using Kafka and processed the xml file using Spark Streaming to capture UI updates.
  • Developed Preprocessing job using Spark Data frames to flatten JSON documents to flat file.
  • Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Used Spark Streaming APIs to perform transformations and actions on the fly for building common learner data model which gets the data -from Kafka in Near real time and persist it to Cassandra.
  • Worked and learned a great deal from AWS Cloud services like EC2, S3, EBS, RDS and VPC.
  • Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, experienced in Maintaining the Hadoop cluster on AWS EMR.
  • Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
  • Implemented Elastic Search on Hive data warehouse platform.
  • Experienced in writing live Real-time processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
  • Worked with Elastic Map Reduce and setup Hadoop environment in AWS EC2 Instances.
  • Used Hive QL to analyze the partitioned and bucketed data, Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business specification logic.
  • Used Kafka functionalities like distribution, partition, replicated commit log service for messaging systems by maintaining feeds.
  • Used Apache Kafka to aggregate web log data from multiple servers and make them available in Downstream systems for Data analysis and engineering type of roles.
  • Experience in using Avro, Parquet, RC file and JSON file formats, developed UDFs in Hive and Pig.
  • Worked with Log4j framework for logging debug, info & error data.
  • Good understanding of Cassandra architecture, replication strategy, gossip, snitch etc.
  • Designed Columnar families in Cassandra and Ingested data from RDBMS, performed data transformations, and then exported the transformed data to Cassandra as per the business requirement.
  • Used the Spark DataStax Cassandra connector to load data to and from Cassandra.
  • Experienced in Creating data-models for Client’s transactional logs, analyzed the data from Casandra tables for quick searching, sorting and grouping using the Cassandra Query Language(CQL).
  • Tested the cluster Performance using Cassandra-stress tool to measure and improve the Read/Writes.
  • Performed transformations like event joins, filter bot traffic and some pre-aggregations using PIG.
  • Developed Custom Pig UDFs in Java and used UDFs from PiggyBank for sorting and preparing the data.
  • Developed Custom Loaders and Storage Classes in Pig to work on several data formats like JSON, XML, CSV and generated Bags for processing using pig etc.
  • Used Amazon DynamoDB to gather and track the event based metrics.
  • Developed Sqoop and Kafka Jobs to load data from RDBMS, External Systems into HDFS and HIVE.
  • Developed Oozie coordinators to schedule Pig and Hive scripts to create Data pipelines.
  • Written several Map reduce Jobs using Java API, also Used Jenkins for Continuous integration.
  • Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
  • Generated various kinds of reports using Power BI and Tableau based on Client specification.
  • Used Jira for bug tracking and Git to check-in and checkout code changes.
  • Responsible for generating actionable insights from complex data to drive real business results for various application teams and worked in Agile Methodology projects extensively.
  • Worked with Scrum team in delivering agreed user stories on time for every Sprint.

Environment: Hortonworks HDP, Spark, Spark-Streaming, Spark SQL, AWS EMR, MapR, HDFS, Hive, Pig, Apache Kafka, Sqoop, Java (JDK SE 6, 7), Scala, Shell scripting, Linux, MySQL, Oracle Enterprise DB, Solr, Jenkins, Eclipse, Oracle, Git, Oozie, Tableau, MySQL, Soap, NIFI, Cassandra, Agile Methodologies.

Confidential, Naperville, IL

Hadoop Developer.

Responsibilities:

  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala, initially done using python
  • Developed Spark jobs using Scala on top of Yarn/MRv2 for interactive and Batch Analysis.
  • Experienced in querying data using SparkSQL on top of Spark engine for faster data sets processing.
  • Worked on implementing Spark Framework on a Java based Web Frame work.
  • Worked on Ad hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB.
  • Processed the Web server logs by developing Multi-hop flume agents by using Avro Sink and loaded into MongoDB for further analysis, also extracted files from MongoDB through Flume and processed.
  • Extracted and restructured the data into MongoDB using import and export command line utility tool.
  • Experience in setting up Fan-out workflow in flume to design v shaped architecture to take data from many sources and ingest into single sink.
  • Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume.
  • Imported several transactional logs from web servers with Flume to ingest the data into HDFS. Using Flume and Spool directory for loading the data from local system(LFS) to HDFS.
  • Installed and configured pig, written Pig Latin scripts to convert the data from Text file to Avro format.
  • Created Partitioned Hive tables and worked on them using HiveQL.
  • Worked with Apache Solr to implement indexing and wrote Custom Solr query segments to optimize the search.
  • Written java code to format XML documents, uploaded them to Solr server for indexing.
  • Experienced on Apache Solr for indexing and load balanced querying to search for specific data in larger datasets and implemented Near Real Time Solr index on Hbase and HDFS.
  • Used Impala connectivity from the User Interface(UI) and query the results using ImpalaQL.
  • Written and Implemented Teradata Fast load, DML and DDL.
  • Installed, Configured TalendETL on single and multi-server environments.
  • Experience in monitoring Hadoop cluster using Cloudera Manager, interacting with Cloudera support and log the issues in Cloudera portal and fixing them as per the recommendations.
  • Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade.
  • Worked on continuous Integration tools Jenkins and automated jar files at end of day.
  • Worked with Tableau and Integrated Hive, Tableau Desktop reports and published to Tableau Server.
  • Developed data pipeline expending Pig and Java MapReduce to consume customer behavioral data and financial antiquities into HDFS for analysis.
  • Experience in setting up the whole app stack, setup and debug log stash to send Apache logs to AWS Elastic search.
  • Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
  • Developed Unix shell scripts to load large number of files into HDFS from Linux File System.
  • Used Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Experienced knowledge over designing Restful services using java based API’s like JERSEY.
  • Worked in Agile development environment. Actively involved in daily Scrum and other design related meetings.

Environment: Hadoop, HDFS, Hive, Map Reduce, AWS Ec2, Solr, Impala, MySQL, Oracle, Sqoop, Kafka, Spark, SQL Talend, Python, PySpark, Yarn, Pig, Oozie, Linux-Ubuntu, Scala, Ab Initio, Tableau, Maven, Jenkins, Java (JDK 1.6), Cloudera, JUnit, agile methodologies.

Confidential, Buffalo, NY

Hadoop Developer

Responsibilities:

  • Responsible for building Data lake in Hadoop using Cloudera Distributions.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Imported the web logs using Flume.
  • Wrote schema for JSON data and converted that data to Avro file format to utilize space efficiently and to make schema changes easy.
  • Used Pig to convert the fixed width file to delimited file.
  • Developed shell scripts to automate loading of tables from database and performed the incremental loads using Sqoop.
  • Experienced in creating Hive External tables on top of HDFS data to query and analyze.
  • Used Hive join queries to join multiple tables according to end user requirement.
  • Used HUE for running Hive queries. Created partitions according to day using Hive to improve performance.
  • Wrote UDFs in Java and embedded within Hive query.
  • Performed performance tuning and troubleshooting of Map Reduce jobs by analyzing and reviewing Hadoop log files
  • Loaded some of the data into HBase for fast retrieval of data.
  • Worked on Avro files, bucketing, partitioning for Hive performance enhancement and storage improvement.
  • Wrote Oozie workflow jobs to execute set of shell scripts, Sqoop, Hive jobs. Also wrote coordinated jobs to trigger the workflow according to time.
  • Responsible for performing data validation using Hive.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: Hadoop, Map Reduce, HDFS, Pig, Hive, Sqoop, Oozie, Java, Linux, Shell Script, Hbase, JSON, XML.

Confidential, NJ

Hadoop Developer

Responsibilities:

  • Analyzed Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase and Sqoop.
  • Installed Hadoop, MapReduce, HDFS, and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
  • Used Hive to analyze the partitioned and bucketed data to compute various metrics for reporting
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Created Hive tables, loaded the data and Performed data manipulations using Hive queries in MapReduce Execution Mode.
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Involved in processing ingested raw data using MapReduce, Apache Pig and HBase.
  • Worked in developing Pig Scripts for data capture change and delta record processing between newly arrived data and already existing data in HDFS.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.

Environment: CDH 3.x and 4.x, Hadoop, Hive, Map Reduce, Pig, HDFS, Sqoop, Oozie.

Confidential

Java Developer

Responsibilities:

  • Involved in all the phases of the life cycle of the project from requirements gathering to quality assurance testing.
  • Developed Class diagrams, Sequence diagrams using Rational Rose.
  • Responsible in developing Rich Web Interface modules with Struts tags,JSP, JSTL, CSS, JavaScript, Ajax, GWT.
  • Developed presentation layer using Struts framework, and performed validations using Struts Validator plugin.
  • Created SQL script for the Oracle database.
  • Implemented the Business logic using Java Spring Transaction Spring AOP.
  • Implemented persistence layer using Spring JDBC to store and update data in database.
  • Produced web service using WSDL/SOAP standard.
  • Implemented J2EE design patterns like Singleton Pattern with Factory Pattern.
  • Extensively involved in the creation of the Session Beans and MDB, using EJB 3.0.
  • Used Hibernate framework for Persistence layer.
  • Extensively involved in writing Stored Procedures for data retrieval and data storage and updates in Oracle database using Hibernate.
  • Deployed and built the application using Maven.
  • Performed testing using JUnit.
  • Used JIRA to track bugs.
  • Extensively used Log4j for logging throughout the application.
  • Produced a Web service using REST with Jersey implementation for providing customer information.
  • Used SVN for source code versioning and code repository.

Environment: Java (JDK1.5), J2EE, Eclipse, JSP, JavaScript, JSTL, Ajax, GWT, Log4j, CSS, XML, Spring, EJB, MDB, Hibernate, Web Logic, REST, Rational Rose, Junit, Maven, JIRA, SVN.

Confidential

Java Developer

Responsibilities:

  • Configured and built Spring MVC application on Tomcat web server.
  • Design and implemented the backend layer using Hibernate.
  • Developed user interface using JSP and HTML.
  • Used JDBC for the Database connectivity.
  • Developed Junit for server-side code.
  • Built, tested and debugged JSP pages for critical modules in the system.
  • Involved in multi-tiered J2EE design utilizing spring inversion of Control (IOC) architecture and Hibernate.
  • Applied design patterns including MVC pattern, Abstract Factory Pattern, DAO Pattern and Singleton.
  • Extensively used JMX API for management and monitoring solutions.
  • Involved in developing front end screens using JSP, DHTML, HTML, CSS, AJAX and JavaScript.
  • Developed Web services to allow communication between applications through SOAP over HTTP using Apache Axis2.
  • Used spring framework for Dependency injection and integrated with the EJB 3.0 using annotations.
  • Generated XML files for the configured beans. The business logic was written in EJB DAO classes and the service layer classes were configured in Spring-service.xml.
  • Made use of content negotiation (XML, JSON, text/plain) using JAXB, GSON.
  • Investigate, debug and fixed potential bugs or defects in the implemented code.
  • Implemented Junit tests and mandated 90% min code coverage.
  • Involved in requirement gatherings and prototype development.
  • Provide post production support for the project during business hours.

Environment: Solaris, JSP, JSF, Spring, Servlet, Hibernate, Struts, JMS, JCA, JMX, XML, CSS, XML Schema, AJAX (JQUERY, AJAX), JSON, JAXP, DOM, SOAP, Java Script, PL/SQL, D/HTML, Maven, Log4j, Junit, WebLogic 8.0, Oracle 9i RDBMS, Eclipse 3.2.

We'd love your feedback!