We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

Chicago, IL

SUMMARY:

  • 12+ years of experience as a software professional that starts from requirement gathering, analysis, design, implementation & testing software products using Java/ J2EE technologies and in Big data technologies using Hadoop ecosystem.
  • Over 4 years of experience in working with different Hadoop ecosystem components such as HDFS, MapReduce, HBase, Spark, Yarn, Kafka, Zookeeper, PIG, HIVE, Sqoop, Storm, Oozie, and Flume.
  • Good experience in creating data ingestion pipelines, data transformations, data management and data governance, real time steaming engines at an Enterprise level.
  • Expertize in java Application Development, Client/Server Applications using core java, J2EE technology, Web Services, REST Services, Oracle, SQL Server and other relational databases.
  • Involved in creating analytical models that could be used for Recommendations, Risk Modeling, Fraud Detection and Prevention, Sentimental Analysis, Click Stream Analysis etc.,
  • Very good experience in real time data streaming solutions using Apache Spark/Spark streaming (Spark SQL, Spark Streaming, MLlib, GraphX), Apache storm, Kafka and flume.
  • Very good knowledge on usage of various big data ingestion techniques using Sqoop, Flume, Kafka, Native HDFS java API, REST API, HttpFS and WebHDFS.
  • Worked on maintaining cluster such as Troubleshooting, managing and reviewing the performance related configuration fine tuning.
  • Experience in working with various Hadoop distributions like Cloudera, Hortonworks and MapR.
  • Good experience in implementing end to end Data Security and Governance within Hadoop platform using Apache Knox, Apache Sentry, Kerberos etc.,
  • Experience with different NoSQL data bases like HBase, Accumulo, Cassandra, MongoDB.
  • Worked with different file formats like AVRO, ORC, Parquet while moving data into and out of HDFS.
  • Experience with Apache Phoenix to access the data stored in HBase.
  • Good experience in Designing, Planning, Administering, Installation, Configuring, Troubleshooting, Performance monitoring and Fine - tuning of Cassandra Cluster.
  • Excellent knowledge on CQL (Cassandra Query Language), for retrieving the data present in Cassandra cluster by running queries in CQL.
  • Worked with Amazon Web Services (AWS) EC2 and S3, EMR, RedShift, Dynamo DB.
  • Experience in software component design technologies like UML Design, Use case and requirement Components diagrams.
  • Experience in Data mining and business Intelligence tools such as Tableau, Qlikview and Microstratergy.
  • Experience in automating tasks with Python Scripting and Shell Scripting.
  • Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart. Well versed with Star-Schema & Snowflake schemas for designing the Data Marts.
  • Developed ETL Scripts for Data acquisition and Transformation using Informatica and Talend.
  • Good experience and understanding of Enterprise Data warehouse (EDW) architecture and possess End to End knowledge of EDW functioning.
  • Experienced in using Agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD).
  • Strong knowledge of System Testing, User Acceptance testing and software quality assurance best practices and methodologies
  • Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web sphere, Apache Tomcat, JBoss.

TECHNICAL SKILLS:

Hadoop/Big Data: MapReduce, HDFS, Hive, Pig, HBase, Sqoop, Spark, Spark SQL, Spark Streaming, Kafka, Flume, Storm, Zookeeper, Phoenix, Oozie, Impala, Hue, Cloudera manager, Ambari

Distributed plat forms: Cloudera, Hortonworks, MapR

Programming Languages: C, C++, C#, Java, Scala, Python, R

Java/J2EE Technologies: Servlets, JSP, JSF, JDBC, Java Beans, RMI & Web services (SOAP, RESTful)

Frameworks: Struts Hibernate and Spring MVC, Microservices Architecture

Development Tools: Eclipse, Net Beans, SBT, ANT, Maven, Jenkins, Bamboo, SOAP UI, QC, Selenium WebDriver, Jira, Bugzilla, SQL Developer, Talend, Informatica

Methodologies: Agile/Scrum, UML, and Waterfall

NoSQL Technologies: Cassandra, MongoDB, HBase, Accumulo, Dynamo DB

Oracle 12c, MySQL, MS: SQL Server, SQL Server, PostgreSQL

Web/ Application Servers: Apache Tomcat, WebLogic, WebSphere

Version Control: Git, SVN

Visualization: Tableau, Microstratergy and Qlikview

Web Technologies: HTML, CSS, XML, JavaScript, jQuery, AngularJS, Node JS, AJAX, SOAP, and REST

Scripting Languages: Unix Shell Scripting, Perl

PROFESSIONAL EXPERIENCE:

Sr. Hadoop/Spark Developer

Confidential - Chicago, IL

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Responsible for managing and scheduling Jobs on a Hadoop cluster.
  • Loading data from UNIX file system to HDFS and vice versa.
  • Improving the performance and optimization of existing algorithms in Hadoop using Spark context, Spark-SQL and Spark YARN.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Worked with Apache Spark for large data processing integrated with functional programming language Scala.
  • Developed POC using Scala, Spark SQL and MLlib libraries along with Kafka and other tools as per requirement then deployed on the Yarn cluster.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Implemented Data Ingestion in real time processing using Kafka.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data
  • Configured Spark Streaming to receive real time data and store the stream data to HDFS.
  • Developed Spark scripts by using Scala shell commands as per the requirement
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS and SOLR.
  • Tested Apache TEZ, an extensible framework for building high performance batch and interactive data processing applications, on Pig and Hive jobs.
  • Used Kafka Streams to Configure Spark streaming to get information and then store it in HDFS.
  • Developed multiple Kafka Producers and Consumers as per the software requirement specifications.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in the form of Data Frame and save the data as Parquet format in HDFS.
  • Real time streaming the data using Spark with Kafka.
  • Responsible for creating Hive tables and working on them using Hive QL.
  • Implementing various Hive UDF's as per business requirements.
  • Exported the analyzed data to the databases using Sqoop for visualization and to generate reports for the BI team.
  • Involved in Data Visualization using Tableau for Reporting from Hive Tables.
  • Developed Python Mapper and Reducer scripts and implemented them using Hadoop Streaming.
  • Developed multiple Map Reduce jobs in java for data cleaning and preprocessing.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Responsible for writing Hive queries for data analysis to meet the business requirements.
  • Customized Apache Solr to handle fallback searching and provide custom functions.
  • Responsible for setup and benchmarking of Hadoop/HBase clusters.

Environment: Hadoop, HDFS, HBase, Sqoop, Hive, Map Reduce, Spark- Streaming/SQL, Scala, Kafka, Solr, Sbt, Java, Python, Ubuntu/Cent OS, MySQL, Linux, GitHub, Maven, Jenkins.

Confidential, Richmond, VA

Sr. Spark and Hadoop Developer

Responsibilities:

  • Build a framework Spark with Scala and migrated existing PySpark applications to improve the runtime and performance.
  • Built a File Watcher service in java to consume the files and write to HDFS and audit the progress in HBase.
  • Implemented ELK (Elastic Search, Log stash, Kibana) stack to collect and analyze the logs produced by the streamsets pipeline.
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Control-M.
  • Developed BTEQ, FAST Export scripts to load data from HDFS to Teradata.
  • Implement a framework with Teradata connector to import and export data from Teradata to Hadoop.
  • Improved performance to export data from Hadoop to Teradata by using performance tuning techniques.
  • Involved in installation of Tez and improved the query performance.
  • Involved in installation of Apache Phoenix in Dev and Test Environment.
  • Build dashboard to audit the incoming file status using the Apache Phoenix.
  • Implemented Streamsets data quality functionality in Java

Environment: Cloudera Distribution, HDFS, Map Reduce, Hue, Impala, Hive, Spark, Kafka, Sqoop, Pig, Streamsets, HBase, Control-M, Scala, Java, Eclipse, Shell Scripts, Teradata

Confidential, Bellevue, WA

Spark and Hadoop Developer

Responsibilities:

  • Created various Spark applications using Scala to perform various aggregations with enterprise data of the users.
  • Using Spark streaming consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
  • Developed custom FTP adaptors to pull the compressed files from FTP servers to HDFS directly using HDFS File System API.
  • Worked on the Real time Streaming using Apache STORM with RabbitMQ and process the compressed files and write to HDFS and HBase.
  • Implemented python script to fetch data from AWS S3 and audit the status in HBase and load it to HIVE.
  • Implemented batch processing of jobs using Spark Scala API.
  • Used Spark SQL and Data Frame API extensively to build spark applications.
  • Used Spark SQL for data analysis and given to the data scientists for further analysis.
  • Closely worked with data science team in building Spark MLlib applications to build various predictive models.
  • Implemented Spark using Scala and utilizing Spark Core, Spark Streaming and Spark SQL API for faster processing of data instead of MapReduce in Java.
  • Developed multiple Map Reduce jobs in Java for complex business requirements including data cleansing and preprocessing.
  • Developed Sqoop scripts to import/export data from RDBMS to HDFS and Hive tables and vice versa.
  • Worked on analyzing Hadoop clusters using Big Data Analytic tools including Map Reduce, Pig and Hive.
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive QL queries.
  • Developed Java Code to read from Rest API and write to HDFS as ORC files.
  • Worked in Spark to read the data from Hive and write it to Cassandra using Java.
  • Developed Shell Scripts and Python Programs to automate tasks.
  • Used ETL (SSIS) to develop jobs for extracting, cleaning, transforming and loading data into data warehouse.
  • Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse.
  • Loaded the final processed data to HBase tables to allow downstream application team to build rich and data
  • Driven applications.
  • Experience in writing Phoenix queries on top of HBase tables to boost query.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Created partitioned tables and loaded data using both static partition and dynamic partition methods.
  • Involved in cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Implemented MapReduce programs to handle semi/ unstructured data like XML, JSON files or log files.
  • Experience in Job management using Fair scheduler and Developed job processing scripts using Control-M.
  • Worked on Cluster of size 1230 nodes.
  • Worked on processing large amount of data 12TB per day.
  • Good experience with continuous Integration of application using Jenkins.
  • Used Reporting tools like Tableau to connect with Hive for generating daily reports of data.
  • Implemented Spark using Scala and utilizing Data Frames and Spark SQL API for faster processing of data.
  • Developed framework to load data from HDFS to Teradata using Spark JDBC API.
  • Developed BTEQ, FAST Export scripts to load data from HDFS to Teradata.
  • Used Sqoop job to import the data from RDBMS using Incremental Import. Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.

Environment: Horton Works Distribution, Hadoop, Hive, Spark, Scala, Kafka, AKKA, Cassandra, Apache Storm, Python,Java, Zookeeper, Map Reduce, Sqoop, HDFS, Oozie, HBase, SQL, Shell Scripting, Teradata, RabbitMQ, Yarn, Mesos

Confidential, Phoenix, AZ

Sr. Hadoop Developer

Responsibilities:

  • Created various Spark applications using Scala to perform various enrichment of these click stream data with enterprise data of the users
  • Developed custom FTP adaptors to pull the clickstream data from FTP servers to HDFS directly using HDFS File System API.
  • Implemented batch processing of jobs using Spark Scala API.
  • Used Spark SQL and Data Frame API extensively to build spark applications.
  • Used spark engine Spark SQL for data analysis and given to the data scientists for further analysis.
  • Closely worked with data science team in building Spark MLlib applications to build various predictive models.
  • Developed multiple Map Reduce jobs in Java for complex business requirements including data cleansing and preprocessing.
  • Migrating existing on-premise applications and services to AWS.
  • Implemented installation and configuration of multi-node cluster on the cloud using Amazon Web Services (AWS) on EC2.
  • Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run a Map-reduce.
  • Developed Sqoop scripts to import/export data from RDBMS to HDFS and Hive tables and vice versa.
  • Worked on analyzing Hadoop clusters using Big Data Analytic tools including Map Reduce, Pig and Hive.
  • Stored the data in tabular formats using Hive tables and Hive SerDe's.
  • Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive QL queries.
  • Worked on implement Hadoop streaming through Apache Kafka and Spark.
  • Using Spark streaming consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
  • Involved building and managing NoSQL Database like HBase or Cassandra.
  • Worked in Spark to read the data from Hive and write it to Cassandra using Java.
  • Involved in developing Pig scripts/Pig UDF and to store unstructured data into HDFS.
  • Involved in designing various stages of migrating data from RDBMS to Cassandra.
  • Developed Shell Scripts and Python Programs to automate tasks.
  • Used ETL (SSIS) to develop jobs for extracting, cleaning, transforming and loading data into data warehouse.
  • Involved in building the ETL architecture and Source to Target mapping to load data into Data warehouse.
  • Loaded the final processed data to HBase tables to allow downstream application team to build rich and data driven applications.
  • Experience in writing Phoenix queries on top of HBase tables to boost query.
  • Involved in writing the shell scripts for exporting log files to Hadoop cluster through automated process.
  • Created partitioned tables and loaded data using both static partition and dynamic partition methods.
  • Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.
  • Involved in cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
  • Analyzed the Hadoop log files using Pig scripts to oversee the errors.
  • Implemented MapReduce programs to handle semi/ unstructured data like XML, JSON files and sequence files for log files.
  • Having daily scrum calls on status of the deliverables with business user/stakeholders, client and drive periodic review meetings.
  • Involved in setting up the QA environment and written unit test cases using MRUnit.

Environment: MapR Distribution, Cassandra 2.1, HDFS, Map Reduce, Hive, Spark, Kafka, Sqoop, Pig, HBase, Oozie, Scala, Java, Eclipse, Shell Scripts, Oracle 10g, Windows, Linux, AWS.

Confidential, Irving, TX

Hadoop Developer

Responsibilities:

  • Creating end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
  • Implemented POC to migrate Map Reduce jobs into Spark RDD transformations using SCALA .
  • Loaded the customer profiles data, customer spending data, credit from legacy warehouses onto HDFS using Sqoop.
  • Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
  • Worked in the transition team which primarily worked on migration of Informatica to Hadoop .
  • Built data pipeline using Pig and Java Map Reduce to store onto HDFS.
  • Used Oozie to orchestrate the map reduce jobs that extract the data on a timely manner.
  • Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase .
  • Used Apache Phoenix to access the data stored in HBase.
  • Performed unit testing using MRUnit.
  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Developed simple to complex Map/reduce Jobs using Hive and Pig.
  • Worked on Real Time/Near Real Time data processing using Flume and Storm.
  • Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms like Gzip, SNAPPY, LZO etc.
  • Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
  • Implemented Pig scripts integrated them into Oozie workflows and performed integrated testing.
  • Used Sqoop job to import the data from RDBMS using Incremental Import. Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language ( HQL ).
  • Developed HIVE and Pig queries and provided support for data analysts.
  • Extensively worked on data ingestion between heterogeneous RDBMS systems and HDFS using Sqoop.
  • Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.
  • Exported the result set from Hive to MySQL using Shell scripts.

Environment: Cloudera Distribution, Hadoop, Hive, Zookeeper, Map Reduce, Sqoop, Pig 0.10 and 0.11, JDK1.6, HDFS, Flume, Oozie, Informatica 9.5, DB2, HBase, PL/SQL, SQL, Shell Scripting.

Confidential

Sr. Java Developer

Responsibilities:

  • Involved in the Analysis, Design, Development, and Testing phases of Software Development Lifecycle (SDLC).
  • Developed user interface using JSP, JavaScript, CSS and HTML.
  • Implemented AJAX to allow dynamic loading, improved interaction and rich look to the User Interface for admin portal.
  • Implementation of J2EE Design Patterns like Singleton, Session Facade and Data Access Objects.
  • Used Hibernate for Object Relation Database Mapping Java classes.
  • Used Spring 3.0 with JMS to establish interactive communication between different domains.
  • Designed and developed a web-based client using Servlets, JSP, Java Script, Tag Libraries, CSS, HTML and XML.
  • Designed Java classes using Spring Framework to implement the Model View Control (MVC) architecture.
  • Good Experience in consuming and exposing SOAP and Restful Web services.
  • Wrote complex SQL queries and programmed stored procedures, packages and triggers using Oracle 10g.
  • Performed Module and Unit Level Testing with JUnit and Log4j.
  • Used JBoss 6.0 as the application server.

Environment: Java 1.5, JDBC, Rest API, Hibernate 3, Spring 3, Servlets, JSPs, XML, XSLT, HTML, MXML, JavaScript, Maven, CVS, Log4j, JUnit, PL/SQL, Oracle 9i, Jboss 6, Eclipse IDE.

Confidential

Java/J2EE Developer

Responsibilities:

  • Involved in back-end and front-end developing team. Took part in developing, maintaining, reviewing and supporting quality code and services.
  • Involved in Daily SCRUM meetings and weekly SPRINT Meetings.
  • Developed these web applications using J2EE technologies like Java Server pages JSPs, Servlets and Struts1.2 frameworks.
  • Developed the application using Struts framework.
  • Implemented Action Classes, Action Forms Struts Tag libraries using Struts framework.
  • Defined and used XML schemas to define web service messages and used in WSDL.
  • Designing and developing of User Interfaces using JSP, HTML, and JavaScript.
  • Work with business analysts to functionally decompose business capabilities into a set of discrete Microservices.
  • Used JDBC, SQL and PL/SQL programming for storing, retrieving, manipulating the data.
  • Extracting, manipulating and updating the Oracle10g databases.
  • Extensively used IDE Eclipse Indigo 3.7, Sub-Versioning SVN system for developing Java based Applications.
  • Had good experience in deploying web applications on Tomcat 5.0 Web server and writing XML based Apache Ant 1.x scripts.
  • Involved in writing test cases for Unit testing JUnit 4.2, Module Testing and Integration Testing.

Environment: Java, JavaScript, HTML, JSP, Microservices Architecture, Servlets, Struts1.2, Eclipse Indigo 3.7, Ant1.x, Oracle10g, Tomcat 5.0.

Hire Now