Hadoop/ Spark Developer Resume New York City, NY - Hire IT People

PROFESSIONAL SUMMARY:

IT professional with 8+ years of experience in software design, development, deployment and maintenance of data analytic applications in fields of health, insurance, finance and retail sectors.
5+ years of experience in building high performance Big Data applications primarily using Hadoop eco - system tools and Spark framework.
Solid understanding of architecture of Hadoop framework eco-system components Map Reduce, Pig, Hive, HBase, Flume, Sqoop, Hue, Oozie, Spark and Kafka.
Good handson experiencing working with various hadoop disrtibutions mainly Cloudera (CDH), Hortonworks (HDP) and Amazon EMR.
Expertise in developing production ready Spark applications utilizing Spark-Core, Data frames, Spark-SQL, Spark-ML and Spark-Streaming API's.
Strong experience troubleshooting failures in spark applications and fine-tuning for better performance.
Experience in using D- Streams in spark streaming, accumulators , Broadcast variables , various levels of caching and optimization techniques in spark.
Strong experience working with both batch and real-time processing using Spark framework.
Worked extensively on Hive for building complex data analytical applications.
Strong experience writing complex map-reduce jobs including development of custom Input Formats and custom Record Readers.
Sound Knowledge in map side join, reduce side join, shuffle & sort, distributed cache, compression techniques, multiple Hadoop Input & output formats.
Worked extensively on Sqoop for performing bulk and incremental ingestion of large datasets from Teradata to HDFS.
Good experience working with AWS Cloud services like S3, EMR, Redshift, Athena etc.,
Deep understanding of performance tuning , partitioning for optimizing spark applications.
Worked on building real time data workflows using Kafka, Spark streaming and HBase.
Extensive knowledge on NoSQL databases like HBase, Cassandra and Mongo DB.
Strong working experience in extracting, wrangling, ingestion, processing, storing, querying and Analyzing structured, semi-structured and unstructured data.
Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
Solid experience in working with csv, text, sequential, Avro, parquet, orc, JSON formats of data.
Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading and storing of data.
Created Talend Mappings to populate the data into dimensions and fact tables.
Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza.
Experienced in job workflow scheduling and monitoring tools like Oozie.
Proficient knowledge and hands on experience in writing shell scripts in Linux.
Expertise in complete Java Package, object oriented design.
Developed core modules in large cross-platform applications using JAVA , JSP , Servlets , Hibernate , RESTful , JDBC , JavaScript , XML , and HTML .
Extensive experience in developing and deploying applications using Web Logic , Apache Tomcat and JBOSS .
Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc.
Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).

TECHNICAL SKILLS:

Big Data Technologies: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Hue, Ambari, Zookeeper, Kafka, Apache Spark, Storm

Hadoop Distributions: CDH, HDP, AWS EMR

Languages: Java, Scala, Python, SQL, Pig Latin, Hive QL

IDE Tools: Eclipse, NetBeans, IntelliJ.

Web Technologies: HTML, CSS, JavaScript, XML, JSP, RESTful.

Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS

Reporting Tools /ETL Tools: Tableau, Power view for Microsoft Excel, Talend

Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (HBase, Cassandra, MongoDB)

Build Automation tools: SBT, Ant, Maven

PROFESSIONAL EXPERIENCE:

Confidential, New York City, NY

Hadoop/ Spark Developer

Responsibilities:

Developed custom input adapters in Java for moving the data from raw sources (FTP, S3) to HDFS.
Developed Spark applications using Scala to perform data cleansing, data validation, data transformations and other enrichments.
Worked extensively on making the spark applications production ready by implementing possible best practices, to make them highly scalable and fault tolerant.
Developed many Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning exercise.
Worked on troubleshooting spark application to make them more error tolerant.
Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
Wrote Kafka producers to stream the data from external rest API to Kafka topics.
Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
Utilized Data frames and Spark SQL API extensively where ever needed.
Data pipeline consists Sqoop, custom build Input Adapters, Spark and Hive.
Worked on performing Hive modeling and written many hive scripts to perform various kinds of data preparations that are needed for running machine learning models.
Worked closely with the data science team in automating and production analyzing various models like logistic regression, k-means using spark-ml.
Worked on working prototype to build a real time workflow for streaming the user events from external applications.
Utilized Kafka and Spark Streaming for building the real time pipeline.
Worked on converting existing map-reduce jobs to spark jobs.
Developed Oozie workflows to automate and productionize the data pipelines.
Implemented Logistic Regression and K-Means models and automated them to run in production.

Environment: Cloudera Distribution, Hadoop, HDFS, Spark, Scala, Kafka, HBase, Oozie, Hive, Flume, Sqoop, Java, SQL, Oracle 11g, Unix/Linux

Confidential, Bethlehem, PA

Hadoop Developer

Responsibilities:

Part of Big Data Center of Excellence(CoE), responsible for designing and building enterprise data analytics solutions.
Worked with respective business units in understanding the scope of the analytics requirements.
Performed core ETL transformations in Spark.
Automated data pipelines which involve data ingestion, data cleansing, data preparation and data analytics.
Created end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL API
Developed end-to-end data pipeline using FTP Adaptor, Spark, Hive and Impala.
Used Scala to write code for all Spark use cases.
Implemented design patterns in Scala for the application.
Implemented Spark using Scala utilized Spark SQL heavily for faster development, and processing of data.
Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
Involved in converting Hive/SQL queries into Spark transformations using Spark SQL and Scala.
Handled importing other enterprise data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HBase tables.
Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Created components like Hive UDFs for missing functionality in HIVE for analytics.
Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
Created Oozie workflows and coordinators to automate data pipelines daily, weekly and monthly.
Automated Cluster creation and termination in AWS.

Environment: Horton Works, HDFS, Hive, Sqoop, Flume, Spark, Scala, HBase, Kafka, Impala, Oozie, Oracle 11g, YARN, UNIX Shell Scripting, Agile Methodology

Confidential, Minnetonka, MN

Hadoop Developer

Responsibilities:

Worked with the business team to gather the requirements and participated in the Agile planning meetings to finalize the scope of each development.
Responsible for building scalable distributed data solutions on CDH.
Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
Implemented data pipelines developing multiple mappers by using Chained Mappers API.
Developed multiple MapReduce batch jobs in java for loading the data to HDFS in sequential format.
Ingested structured data from wide array of RDBMS to HDFS as incremental import using Sqoop.
Involved in writing Pig scripts to wrangle the raw data and store it to HDFS, load the data to Hive tables using HCatalog.
Configured Flume agents on different data sources to capture the streaming log data from the web servers.
Implemented Flume (Multiplexing) to steam data from upstream pipes in to HDFS.
Created Hive external tables with clustering and partitioning on the date for optimizing the performance of ad-hoc queries.
Involved in writing Hive QL scripts on beeline, impala, hive cli for the consumer data analysis to meet business requirements.
Exported data from Hive to DWH using Sqoop.
Worked with different file formats and compression techniques to ensure optimal performance of hive queries.
Involved in creating Hive tables from wide range of data formats like csv, text, sequential, Avro, parquet, orc, JSON and custom formats using SerDe .
Transformed the semi-structured log data to fit into the schema of the Hive tables using Pig.
Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
Involved in testing and designing low level and high-level documentation for the business requirement.

Environment: Cloudera Hadoop, Eclipse, Map Reduce, java, Sqoop, Pig, Oozie, Hive, Flume, Cent OS, MySQL, Oracle DB.

Confidential, Overland Park, KS

Hadoop Engineer

Responsibilities:

Imported Data from different Relational Data Sources like Teradata, Oracle to HDFS using Sqoop.
Imported Bulk Data into HBase tables using Map Reduce programs.
Inserted time series data in HBase using HBase Java Api.
Designed and implemented Incremental Imports into Hive tables.
Used Rest ApI to Access HBase data to perform analytics.
Worked in loading and transforming large sets of structured, semi structured and unstructured data
Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way
Experienced in managing and reviewing the Hadoop log files.
Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
Worked with Avro Data Serialization system to work with JSON data formats.
Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
Worked on Oozie workflow engine for job scheduling.

Environment: Hadoop, HDFS, Map Reduce, Hive, Teradata, Oozie, Sqoop, Pig, Java, Rest API, Maven, MRUnitJunit.

Confidential, Richfield, MN

Java/ Hadoop Developer

Responsibilities:

Developed web applications by coordinating requirements, user stories, use cases, screen mockups, schedules, and activities.
Work closely with client business stakeholders on Agile development teams.
Support users by developing documentation and assistance tools.
Developed presentation using Spring Framework and used multiple modules in Spring like, Spring MVC, JDBC
Implemented Web-Services to integrate between different applications components using RESTful using Jersey.
Developed RESTful Web services for transmission of data in JSON/XML format.
Involved in writing SQL queries, functions, views, triggers and stored procedures and also using Oracle relational database.
Used Sqoop to ingest structured data from Oracle database to HDFS.
Involved in writing and running Map Reduce batch jobs using java for data wrangling on the cluster.
Developed map side, reduce side joins using Distributed Cache on various data sets.
Developed Pig Latin scripts to transform the data according to the business requirement.
Developed Pig UDFs extending eval, filter functions using java to filter semi structured data.

Environment: Java, J2EE, Eclipse, JSP, Servlets, Spring, JavaScript, HTML, RESTful, shell scripting, XML, Oracle 10g, Cloudera Hadoop, Map Reduce, Pig, HDFS.

Confidential

Java/ J2EE Developer

Responsibilities:

Involved in Analysis, design and development of web applications based on J2EE.
Struts framework is used for managing the navigation and page flow.
Developed the EJB-Session Bean acts as Facade, will be able to access the business entities through their local home interfaces.
Designed the user interface using HTML, CSS, java Script and JQuery
Used Log4j to debug and generate new logs for the application.
Used JDBC for accessing the data from the Oracle database. Created database tables, stored procedures using PL/SQL in Oracle DB.
Validation on Web Forms, for client-side validation as per the requirement.
Experienced in developing code to convert JSON data to Customize JavaScript objects.
Developed Servlets and JSPs based on MVC pattern using Struts framework.
Provided support for Production and Implementation Issues. Involved in end-user/client training of the application.
Performed Unit Tests on the application to verify and identify various scenarios.
Used Eclipse for development, Testing, and Code Review.
Involved in the release management process to QA/UAT/Production regions.
Used Maven tool for building application EAR for deploying on Web Logic Application servers.
Developed of the project in the agile environment.

Environment: J2EE, Java, Eclipse, EJB, Java Beans, JDBC, JSP, Struts, Design Patterns, BEA WebLogic, PL/SQL, DB2, UML, CVS, JUnit, Log4j.

Confidential

Java Developer

Responsibilities:

Involved in various phases of Software Development Life Cycle (SDLC) of the application like Requirement gathering, design, development and documentation.
The application is designed using J2EE design patterns and technologies based on MVC architecture
Involved in designing the user interfaces using HTML, CSS and JavaScript for client-side validation.
Developed custom tags, JSTL to support custom User Interfaces.
Handled business logic as a Model using the helper classes and Servlets to control the flow of application as controller as server-side validations.
Involved in Servlets, Java Bean programming on the server side for the communication between clients and server.
Involved in Implemented Web-Services to integrate between different applications (internal and third party components) using Restful services.
Involved in writing unit testing for doing positive and negative test cases.
Developed the maven scripts for preparing WAR files used to deploy J2EE components.
Created tables, views, triggers, stored procedures on MySQL server for data manipulation and retrieval.
Used JDBC to invoke Stored Procedures and for database connectivity to database server.
Used Log4J to capture the log that includes runtime exceptions.
Involved in Bug fixing and functionality enhancements.
Developed the project using Waterfall model.

Environment: J2EE, Java, UNIX, red-hat, Putty, MVC, JSP, JDBC, Eclipse IDE, Apache Tomcat, CSS, HTML, JavaScript, SQL Server.

We provide IT Staff Augmentation Services!

Hadoop/ Spark Developer Resume

New York City, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship