We provide IT Staff Augmentation Services!

Sr. Hadoop/ Spark Developer Resume

New York, NY

SUMMARY:

  • 7+ years of professional experience working with data, which includes hands on experience of 3+ years in analysis, design, development and maintenance of Hadoop and Java based applications.
  • Expertise in understanding of Hadoop Architecture and various components such as Kafka, Flume and MapReduce concepts and experience in working with MapReduce programs using Apache Hadoop for working with Big Data to analyze large data sets efficiently.
  • Extensive experience of development using Hadoop ecosystem covering Map Reduce, HDFS, YARN, Hive, Impala, Pig, Hbase, Spark, Sqoop, Oozie, Cloudera.
  • Experience with an in - depth level of understanding in the strategy and practical implementation of AWS Cloud-Specific technologies includingIAM, EC2, EMR, SNS, RDS, Redshift, Athena, Dynamo DB, Lambda, Cloud Watch, Auto-Scaling, S3, and Route 53.
  • Strong experience in analyzing data using HiveQL, SparkSQL, HBase and custom Map Reduce programs.
  • Performed importing and exporting data into HDFS and Hive using Sqoop.
  • Experience in writing shell scripts to dump the Shared data from MySQL servers to HDFS.
  • Working knowledge in python and Scala to use spark.
  • Knowledge of extracting an Avro schema using Avro-tools, XML using XSD and evolving an Avro schema by changing JSON files.
  • Experience in Amazon AWS cloud Administration and actively involved highly available, Scalability, cost effective and fault tolerant systems using multiple AWS services.
  • Strong problem-solving, organizing, team management, communication and planning skills, with ability to work in team environment. Ability to write clear, well-documented, well-commented and efficient code as per the requirement.
  • Capable of processing large sets of structured, Semi-structured and unstructured data and supporting systems application architecture.
  • Able to assess business rules, collaborate with stakeholders and perform source-to-target data mapping, design and review.
  • Good knowledge of No-SQL databases Cassandra, MongoDB and HBase .
  • Worked on HBase to load and retrieve data for real time processing using Rest API.
  • Experience in developing applications using waterfall and Agile ( XP and Scrum ).
  • Strong Problem Solving and Analytical skills and abilities to make Balanced & Independent decisions.
  • Expertise in Database Design, Creation and Management of Schemas, writing Stored Procedures , Functions, DDL, DML SQL queries .
  • Experienced with build tool ANT, Maven and continuous integrations like Jenkins.
  • An excellent team player and self-starter with good communication skills and proven abilities to finish tasks before target deadlines.

TECHNICAL SKILLS:

Big Data Ecosystems: Hadoop, MapReduce, HDFS, HBase, Zookeeper, Hive, Pig, SqoopSpark, Cassandra, Oozie, Flume, kafka and Talend

Programming Languages: Java, C/C++, Scala, Python and shell Scripting

Scripting Languages: JavaScript, XML, HTML, Python and Linux Bash Shell Scripting, Unix

Tools: Eclipse, JDeveloper, JProbe, CVS, MS Visual Studio

Platforms: Windows(2000/XP), Linux, Solaris

Databases: NoSQL, Oracle, DB2, MS SQL Server (2000, 2008), TeradataHbase, Cassandra, Cloudera 5.9

PROFESSIONAL EXPERIENCE:

Sr. Hadoop/ Spark Developer

Confidential, New York, NY

Responsibilities:

  • Developed data pipeline using Spark, Hive, Pig, python, Impala and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
  • Responsible for implementing a generic framework to handle different data collection methodologies from the client primary data sources, validate transform using spark and load into S3.
  • Collected data using Spark Streaming from AWS S3 bucket in near-real-time and performs necessary transformations and Aggregation on the fly to build the common learner data model and persists the data in HDFS.
  • Explored the usage of Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark SQL and Spark Yarn.
  • Developed Spark Code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
  • Involved in converting Hive/SQL queries into Spark Transformations using Spark RDDs and Scala.
  • Worked on the Spark SQL and Spark Streaming modules of Spark and used Scala and Python to write code for all Spark use cases.
  • Explored the Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark-Context, Spark-SQL, Data Frame and Pair RDD's.
  • Migrated historical data to S3 and developed a reliable mechanism for processing the incremental updates.
  • Scheduled spark jobs and Apache Airflow jobs inside EMR, to read data from S3, transform it and load it to Postgres RDS.
  • Using Kafka, implemented data solution to correlate data from SQL and NoSQL databases.
  • Using scala shell commands, wrote spark scripts as per the requirement.
  • Analyzed data in hive using Spark API over Hortonworks.
  • Used Oozie workflow engine to manage independent Hadoop jobs and to automate several types of Hadoop such as java MapReduce, Hive and Sqoop as well as system specific jobs.
  • Used to monitor and debug Hadoop jobs/applications running in production.
  • Worked on providing user support and application support on Hadoop infrastructure.
  • Worked on evaluating, comparing different tools for test data management with Hadoop.
  • Supported the testing team on Hadoop Application Testing.

Environment: Cloudera, Spark, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Kafka, AWS EC2, S3, EMR, RDS, Linux Shell Scripting, Postgres, MySQL.

Hadoop/ Spark Developer

Confidential, Atlanta, GA

Responsibilities:

  • Involved in the high-level design of the Hadoop 2.6.3 architecture for the existingdata structure and Problem statement and setup a new cluster and configured the entire Hadoop platform.
  • Extracted files from MySQL, Oracle, and Teradatathrough Sqoop 1.4.6 and placed in HDFS storage Distribution and processed.
  • Push data from Amazon S3 storage to Redshift using Key, Value pairs as required by BI team.
  • Processed data using Athena on S3 worked on gateway nodes and connectors (Jar files) connecting sources with AWS cloud.
  • Developed efficient MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform data cleaning and preprocessing on EMR.
  • Worked with various HDFS file formats like Avro 1.7.6, Sequence File, Json and various compression formats like Snappy, bzip2.
  • Continuous monitoring and managing the Hadoop cluster using Ambari.
  • Used Pig to perform data validation on the data ingested using Sqoop and Flume and the cleansed data set is pushed into Hive.
  • Increased performance of the HiveQLs by splitting larger queries into small and by introducing temporary tables in between them.
  • Implemented various performance techniques like (Partitioning, Bucketing) in Hive to get better performance.
  • Designed and built the Reporting Application, which uses the SparkSQL to fetch and generate reports on HBase table data.
  • Developed data pipeline using Kafka to ingest behavioral data, used Spark Streaming for the data filtering and storing into HDFS.
  • Consuming data from Kafka topics Using Pyspark, Parsing and transforming data using python and spark functions from built-in libraries, then storing into Hive tables.
  • Wrote Kafka producers to stream the data from external rest APIs to Kafka topics.
  • Developed custom Unix SHELL scripts to do pre-post validations of master and slave nodes, before and after configuring the name node and datanodes respectively.
  • Driving the application from development phase to production phase using Continuous Integration and Continues Deployment (CI/CD) model using Maven and Jenkins.
  • Developed small distributed applications in our projects using Zookeeper 3.4.7 and scheduled the workflows using Oozie 4.2.0.

Environment: Hadoop, Amazon S3, EMR, Redshift, HDFS, Hive,Impala, Spark, Scala, Python, Pig, Sqoop, Oozie, GIT,Oracle, DB2, MySQL, UNIX Shell Scripting, JDBC.

Hadoop Developer

Confidential, Dallas, TX

Responsibilities:

  • Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
  • Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Collected and aggregated large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Kafka and stored the data into HDFS for analysis.
  • Developed multiple Kafka Producers and Consumers from scratch implementing as per organization's requirements.
  • Setup Flume for different sources to bring the log messages from outside to Hadoop HDFS.
  • Responsible for creating, modifying topics (Kafka Queues) as and when required with varying configurations involving replication factors, partitions and TTL.
  • Performing aggregations on large amounts of data using Apache SparkScalaand landing data in Hive warehouse forfurther analysis.
  • Wrote and tested complex MapReducejobs for aggregatingidentified and validated data.
  • Created Managed and External Hive tables with static/dynamic partitioning.
  • Written Hive queries for data analysis to meet the Business requirements.
  • Increased performance of the HiveQLs by splitting larger queries into small and by introducing temporary tables in between them.
  • Used open source web scraping framework for python to crawl and extract data from web pages.
  • Optimized the Hive queries by setting different combinations of Hive parameters.
  • Developed UDF’s (User Defined Functions)to extend core functionality of PIG and HIVE queries as per requirement.
  • Implemented workflow using Oozie for running Map Reduce jobs and Hive Queries.
  • Extensively involved in performance tuning of the ImpalaQL by performing bucketing on large tables
  • Design the extraction, transformation and loading solutions using Informatica Power Center, and Teradata: BTEQ, FastLoad, MLoad, TPump tools.

Environment: Apache Hadoop, HDFS, Map Reduce, Hive, Sqoop, Kafka, Flume, Zookeeper, Spark, Hbase, Python, Shell Scripting, Oozie.

Sr. Software Developer

Confidential

Responsibilities:

  • Competency in using XML Web Services by using SOAP to transfer data to supply chain and for domain expertise Monitoring Systems.
  • Worked on Maven to build tool for building jar files. Used the Hibernate framework (ORM) to interact with the database.
  • Knowledge in struts tiles framework for layout management. Worked on design, analysis, and development and testing various phases of the application.
  • Develop named HQL queries and Criteria for use in application. Developed user interface using JSP and HTML.
  • Used JDBC for the Database connectivity. Involved in projects utilizing Java, Java EE web applications in the creation of fully-integrated client management systems.
  • Consistently met deadlines as well as requirements for all production work orders.
  • Executed SQL statements for searching contactors depending on Criteria. Development and integration of the application using Eclipse IDE.
  • Involved in building, testing and debugging of JSP pages in the system. Involved in multi-tiered J2EE design utilizing spring (IOC) architecture and Hibernate.
  • Involved in the development of front-end screens using technologies like JSP, HTML, AJAX and JavaScript.
  • Configured spring managed beans. Spring Security API is used for configured security.

Environment: Java, J2EE, JSP, Hibernate, Struts, XML Schema, SOAP, Java Script, PL/SQL, Junit, AJAX, HQL, JSP, HTML, JDBC, Maven, Eclipse.

Associate Software Developer

Confidential

Responsibilities:

  • Involved in various phases of Software Development Life Cycle (SDLC), such as requirements gathering, modelling, analysis, design and development.
  • Ensured clear understanding of customer's requirements before developing the final proposal.
  • Generated Use case diagrams, Activity flow diagrams, Class diagrams and Object diagrams in the design phase.
  • Used Java Design Patterns like DAO, Singleton etc.
  • Written complex SQL queries for retrieving and updating data.
  • Involved in implementing multithreaded environment to generate messages.
  • Used JDBC Connections and WebSphere Connection pool for database access.
  • Used Struts tag libraries (like html, logic, tab, bean etc.) and JSTL tags in the JSP pages.
  • Involved in development using Struts components - Struts-config.xml, tiles, form-beans and plug-ins in Struts architecture.
  • Involved in design and implementation of document-based Web Services.
  • Used prepared statements and callable statements to implement batch insertions and access stored procedures.
  • Involved in bug fixing and for the new enhancements.
  • Responsible for handling the production issues and provided solutions.
  • Configured connection pooling using WebLogic application server.
  • Developed and Deployed the Application on WebLogic using ANT build.xml script.
  • Developed SQL queries and stored procedures to execute the backend processes using Oracle.
  • Deployed application on WebLogic Application Server and development using Eclipse.

Environment: Java 1.4, Servlets, JSP, JMS, Struts, Validation Framework, tag Libraries, JSTL, JDBC, PL/SQL, HTML, JavaScript, Oracle 9i (SQL), UNIX, Eclipse 3.0, LINUX, CV.

Hire Now