Hadoop Developer / Spark Developer Resume San Francisco, CA - Hire IT People

PROFESSIONAL SUMMARY:

Over 8 years of professional IT experience including 5 plus years of experience on Big Data, Hadoop Development and Data Analytics, Development and Design of Java based enterprise applications.
Very strong knowledge on Hadoop ecosystem components like HDFS, SPARK, HIVE, HBASE, SQOOP, KAFKA, MAPREDUCE, PIG and OOZIE.
Strong Knowledge on Architecture of Distributed systems and Parallel processing frameworks.
In - depth understanding of internals of MapReduce framework and Spark execution model.
Expertise in developing production ready Spark applications utilizing Spark-Core, Dataframes, Spark-SQL, Spark-ML and Spark-Streaming API's.
Experience in different Hadoopdistributions like Cloudera(Cloudera distribution CDH3, 4 and 5), Horton Works Distributions (HDP) and Elastic Mapreduce (EMR).
Worked extensively on fine-tuning long running Spark Applications to utilize better parallelism and executor memory for more caching.
Strong experience working with both batch and real-time processing using Spark framework.
Strong knowledge on performance tuning Hive queries and troubleshooting various issues related to Joins, memory exceptions in Hive.
Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
Used custom serdes like Regex Serde, JSON Serde, CSV Serde etc., in hive to handle mutiple formats of data.
Strong experience using different columnar file formats like Avro, RCFile, ORC and Parquet formats.
Hands on experience in installing, configuring and deploying Hadoop distributions in cloud environments (Amazon Web Services).
Experience in optimizing Map-Reduce algorithms by using Combiners and custom partitioners.
Expertise in back-end/server-side java technologies such as: Web services, Java persistence API (JPA), Java Messaging Service (JMS), Java Database Connectivity (JDBC)
Experience in NoSQL Column-Oriented Databases like HBase, Apache Cassandra, MongoDB and its Integration with Hadoop cluster.
Experienced in writing custom Map Reduce programs & UDF's in Java to extend Hive and Pig core functionality.
Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
Created Talend Mappings to populate the data into dimensions and fact tables.
Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance Tuning of mappings.
Worked with Sqoop to move (import/export) data from a relational database into Hadoop.
Experience working with Hadoop clusters using Cloudera, Amazon AWS and Horton works distributions.
Experience in installation, configuration, support and management of a Hadoop Cluster.
Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
Experienced in using agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD).
Proficient in integrating and configuring the Object-Relation Mapping tool, Hibernate in J2EE applications and other open source frameworks like Struts and Spring.
Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web sphere, Apache Tomcat, JBoss.
Experience in writing test cases in Java Environment using JUnit.
Hands on experience in development of logging standards and mechanism based on Log4j
Experience in building, deploying and integrating applications with ANT, Maven.
Good knowledge in Rest Services,Web Services, SOAP programming, WSDL, and XML parsers like SAX.
Flexible, enthusiastic and project oriented team player with excellent communication skills with leadership abilities to develop creative solutions for challenging requirement of client.

TECHNICAL SKILLS:

Languages: Java, Scala, Pyhton, SQL, Pig Latin, HiveQL, Shell Scripting

BigData/Hadoop Technologies: HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper and Oozie

NO SQL Databases: HBase, Cassandra, MongoDB

J2EE/Middleware: J2EE (Servlets 2.4, JSP 2.0, JDBC, JMS)

Database: Microsoft SQL Server, MySQL, Oracle, DB2

Cloud Computing Tools: Amazon AWS

Development Tools: Microsoft SQL Studio, Eclipse, IntelliJ

Development Methodologies: Agile/Scrum, Waterfall

GUI Technologies: HTML, XHTML, CSS, JavaScript, Ajax, AngularJs

Web/App Servers: Web Logic, Web Sphere

Operating Systems: UNIX, Windows, Mac, LINUX

Office Suite: Microsoft Office (Word/Excel/PowerPoint)

PROFESSIONAL EXPERIENCE:

Confidential, San Francisco, CA

Hadoop Developer / Spark Developer

Responsibilities:

Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
Developed many Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning exercise.
Worked on troubleshooting spark application to make them more error tolerant.
Worked on fine-tuning spark applications to improve the over all processing time for the pipelines.
Wrote Kafka producers to stream the data from external rest apis to Kafka topics.
Wrote Spark-Streaming applications to consume the data from kafka topics and write the processed streams to HBase.
Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient Joins, transformations and other capabilities.
Worked extensively with Sqoop for importing data from Oracle.
Experience working for EMR cluster in AWS cloud and working with S3.
Involved in creating Hive tables, loading and analyzing data using hive scripts.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Good experience with continuous Integration of application using Jenkins.
Used Reporting tools like Tableau to connect with Impala for generating daily reports of data.
Collaborated with the infrastructure, network, database, application and BA teams to ensure data quality and availability.

Environment: Hadoop YARN, Spark-Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Sqoop, Amazon AWS, HBase, Tableau, Oozie, Oracle, Linux

Confidential, Minneapolis, MN

Hadoop/Spark Developer

Responsibilities:

Used Cloudera distribution extensively.
Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL apis.
Developed Spark Programs for Batch processing.
Written new spark jobs in Scala to analyze the data of the customers and sales history.
Worked on Spark SQL and Spark Streaming.
Developed Spark scripts by using Scala shell commands as per the requirement.
Used Kafka to get data from many streaming sources into HDFS.
Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Good experience in Hive partitioning, Bucketing and Collections perform different types of joins on Hive tables.
Created Hive external tables to perform ETL on data that is generated on daily basics.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Written HBase bulk load jobs to load processed data to Hbase tables by converting to HFiles.
Performed validation on the data ingested to filter and cleanse the data in Hive.
Used Spark Sql with Scala for creating data frames and performed transformations on data frames.
Created SQOOP jobs to handle incremental loads from RDBMS into HDFS and applied Spark transformations.
Implemented Spark SQL to access hive tables into spark for faster processing of data.
Loaded the data into hive tables from spark and used parquet columnar format.
Developed oozie workflows to automate and productionize the data pipelines.

Environment: s: Hadoop, Hive, Flume, Shell Scripting, Java, Eclipse, HBase, Kafka, Spark, Spark Streaming, Scala, Oozie, HQL/SQL, Teradata.

Confidential, Chicago, IL

Hadoop Developer

Responsibilities:

Developed Map Reduce programs for data extraction, transformation and aggregation.
Monitor and troubleshoot Map Reduce Jobs those are running on the cluster.
Implemented solutions for ingesting data from various sources and processing the data utilizing hadoop services like Sqoop, Hive, Pig, Sqoop, HBase, Map reduce, etc.
Worked on creating Combiners, Partitioners and Distributed cache to improve the performance of Map Reduce jobs.
Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
Optimization of Map reduce algorithms using combiners and partitioners to deliver the best results and worked on Application performance optimization for a HDFS cluster.
Orchestrated many Sqoop scripts, Pig scripts, Hive queries using Oozie workflows and sub workflows.
Used Flume to collect, aggregate, and store the web log data from different sources like web servers and pushed to HDFS.
Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
Involved in debugging Map Reduce jobs using MRUnit framework and optimizing Map Reduce jobs.
Involved in troubleshooting errors in Shell, Hive and Map Reduce.
Worked on debugging, performance tuning of Hive & Pig Jobs.
Design and implement map reduce jobs to support distributed processing using Map Reduce, Hive and Apache Pig.
Created Hive external tables on the map reduce output before partitioning, bucketing is applied on top of it.

Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Hbase, DB2, Flume, ESP, Oozie, Maven, Unix Shell Scripting.

Confidential, Eden Prairie, MN

Hadoop developer

Responsibilities:

Communicating with business customers effectively to gather the required information for the project.
Involved in loading data into HDFS from Teradata using Sqoop.
Experienced in moving huge amounts of log file data from different servers
Worked on implementing complex data transformations using MapReduce framework.
Involved in generating structured data through MapReduce jobs and have stored them in Hive tables and have analyzed the results through Hive queries based on the requirements.
Worked on performance improvement by implementing Dynamic Partitioning and Buckets in Hive and by designing managed and external tables.
Worked on development of PIG Latin scripts and have used ETL tools and Informatica for some pre-aggregations
Worked on MapReduce programs to cleanse and pre-process data from various sources.
Worked on Sequence files and Avro files in map Reduce programs.
Created Hive Generic UDF’s for implementing business logic. And have worked on incremental imports to Hive Tables.
Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
Loaded processed data into HBase tables using HBase Java api calls.

Environment: Hadoop, Cloudera, MapReduce, Hive, Impala, Pig, HBase, Sqoop, Flume, Oozie, Java, Maven, RHEL and UNIX Shell

Confidential, Knoxville, TN

Hadoop Developer

Responsibilities:

Collaborated with different teams for Cluster Planning, Hardware requirement, network equipment’s to implement 9 node Hadoop cluster using Cloudera distribution.
Involved in implementation and ongoing administration of Hadoop infrastructure.
Screening of Hadoop cluster Job performances and Cluster capacity planning.
Worked on analyzing Hadoop stack and developed multiple poc’s using MapReduce, Pig, Hive, HBase, Sqoop, Flume.
Good understanding of AWS (amazon web services) EC2, RDS & S3
Implemented commissioning and decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
Resolving tickets submitted by users, troubleshoot the documented errors, resolving the errors.
Involved in creating Hive tables, loading and analyzing data using hive queries.
Dumped the data from one cluster to another cluster by using DistCp(Distributed copy ).
Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre- processing with Pig.
Implemented a script to transmit information from Oracle to HBase and Cassandra using Sqoop.
Assisted in exporting analyzed data to NoSQL DB's Cassandra and HBase using Sqoop.
Worked on tuning the performance of Hive and Pig queries.
Performance tuning of Hadoop clusters and Hadoop Map Reduce routines.
Manage and review Hadoop log files.
Involved in HDFS maintenance and loading of structured and unstructured data from Linux machines, wrote MapReduce jobs using Java API and Pig Latin as well.
Monitor Hadoop cluster connectivity and security.
Implemented Fair scheduler on the Job tracker to share the resources of the Cluster for the MapReduce jobs given by the user.
Worked with application teams to install OS, Hadoop updates, patches, versions upgrade as required.
Aligning with the system engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
Good with Java to write MapReduce business logics, and UDF's for PIG and HIVE

Environment: Cloudera Distributed Hadoop(CDH4), HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, HBase, Oozie, Impala, KafkaConfidential

Java Developer

Responsibilities:

Created Stored procedures using PL-SQL for data modification (Using DML insert, update, delete) in Oracle
Involved in Requirement Analysis, Development and Documentation.
Used MVC architecture (Jakarta Struts framework) for Web tier.
Participated in developing form-beans and action mappings required for struts implementation and validation framework using struts.
Development of front-end screens with JSP Using Eclipse.
Involved in Development of Medical Records module. Responsible for development of the functionality using Struts and EJB components.
Coding for DAO Objects using JDBC (using DAO pattern).
XML and XSDs are used to define data formats.
Implemented J2EE design patterns value object singleton, DAO for the presentation tier, business tier and Integration Tier layers of the project.
Involved in Bug fixing and functionality enhancements.
Designed and developed excellent Logging Mechanism for each order process using Log4j.
Involved in writing OracleSQL Queries.
Involved in Check-in and Checkout process using CVS.
Created SAP Business Objects Reports.
Developed additional functionality in the software as per business requirements.
Involved in requirement analysis and complete development of client-side code.
Followed Sun standard coding and documentation standards.
Participated in project planning with business analysts and team members to analyze the Business requirements and translated business requirements into working software.
Developed software application modules using disciplined software development process.

Environment: Java, J2EE, JSP, EJB, ANT, STRUTS1.2, Log4j, Weblogic 7.0, JDBC, Eclipse, Windows XP, CVS, Oracle, SAP Business Objects, Netezza

Confidential

Java Developer

Responsibilities:

Involved in Presentation Tier Development using JSF Framework and ICE Faces tag Libraries.
Involved in business requirement gathering and technical specifications.
Implemented J2EE standards, MVC2 architecture using JSF Framework.
Implementing Servlets, JSP and Ajax to design the user interface.
Extensive experience in building GUI (Graphical User Interface) using JSF and ICE Faces.
Developed Rich Enterprise Applications using ICE Faces and Portlets technologies.
Experience using ICE Faces Tag Libraries to develop user interface components.
All the Business logic in all the modules is written in core Java.
Wrote Web Services using SOAP for sending and getting data from the external interface.
Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework.
Middleware Services layer is implemented using EJB (Enterprise Java Bean - stateless) in WebSphere environment.
Funds Transfers are sent to another application using JMS technology asynchronously.
Involved in implementing the JMS (Java messaging service) for asynchronous communication.
Created Stored procedures using PL-SQL for data modification (Using DML insert, update, delete) in Oracle

Environment: J2EE, EJB, JSF, ICE Faces, EJB, Web Services, XML, XSD, Agile, Microsoft Visio, Clear Case, Oracle 9.i/10.g, Weblogic8.1/10.3,RAD, Log4j,Servlets, JSP, Unix.

We provide IT Staff Augmentation Services!

Hadoop Developer / Spark Developer Resume

San Francisco, CA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship