Hadoop Developer / Spark Developer Resume
San Francisco, CA
PROFESSIONAL SUMMARY:
- Over 8 years of professional IT experience including 5 plus years of experience on Big Data, Hadoop Development and Data Analytics, Development and Design of Java based enterprise applications.
- Very strong knowledge on Hadoop ecosystem components like HDFS, SPARK, HIVE, HBASE, SQOOP, KAFKA, MAPREDUCE, PIG and OOZIE.
- Strong Knowledge on Architecture of Distributed systems and Parallel processing frameworks.
- In - depth understanding of internals of MapReduce framework and Spark execution model.
- Expertise in developing production ready Spark applications utilizing Spark-Core, Dataframes, Spark-SQL, Spark-ML and Spark-Streaming API's.
- Experience in different Hadoopdistributions like Cloudera(Cloudera distribution CDH3, 4 and 5), Horton Works Distributions (HDP) and Elastic Mapreduce (EMR).
- Worked extensively on fine-tuning long running Spark Applications to utilize better parallelism and executor memory for more caching.
- Strong experience working with both batch and real-time processing using Spark framework.
- Strong knowledge on performance tuning Hive queries and troubleshooting various issues related to Joins, memory exceptions in Hive.
- Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Used custom serdes like Regex Serde, JSON Serde, CSV Serde etc., in hive to handle mutiple formats of data.
- Strong experience using different columnar file formats like Avro, RCFile, ORC and Parquet formats.
- Hands on experience in installing, configuring and deploying Hadoop distributions in cloud environments (Amazon Web Services).
- Experience in optimizing Map-Reduce algorithms by using Combiners and custom partitioners.
- Expertise in back-end/server-side java technologies such as: Web services, Java persistence API (JPA), Java Messaging Service (JMS), Java Database Connectivity (JDBC)
- Experience in NoSQL Column-Oriented Databases like HBase, Apache Cassandra, MongoDB and its Integration with Hadoop cluster.
- Experienced in writing custom Map Reduce programs & UDF's in Java to extend Hive and Pig core functionality.
- Extensive experience in ETL process consisting of data transformation, data sourcing, mapping, conversion and loading.
- Created Talend Mappings to populate the data into dimensions and fact tables.
- Broad design, development and testing experience with Talend Integration Suite and knowledge in Performance Tuning of mappings.
- Worked with Sqoop to move (import/export) data from a relational database into Hadoop.
- Experience working with Hadoop clusters using Cloudera, Amazon AWS and Horton works distributions.
- Experience in installation, configuration, support and management of a Hadoop Cluster.
- Knowledge in UNIX Shell Scripting for automating deployments and other routine tasks.
- Experienced in using agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD).
- Proficient in integrating and configuring the Object-Relation Mapping tool, Hibernate in J2EE applications and other open source frameworks like Struts and Spring.
- Experience in building and deploying web applications in multiple applications servers and middleware platforms including Web logic, Web sphere, Apache Tomcat, JBoss.
- Experience in writing test cases in Java Environment using JUnit.
- Hands on experience in development of logging standards and mechanism based on Log4j
- Experience in building, deploying and integrating applications with ANT, Maven.
- Good knowledge in Rest Services,Web Services, SOAP programming, WSDL, and XML parsers like SAX.
- Flexible, enthusiastic and project oriented team player with excellent communication skills with leadership abilities to develop creative solutions for challenging requirement of client.
TECHNICAL SKILLS:
Languages: Java, Scala, Pyhton, SQL, Pig Latin, HiveQL, Shell Scripting
BigData/Hadoop Technologies: HDFS, YARN, MapReduce, Hive, Pig, Impala, Sqoop, Flume, Spark, Kafka, Storm, Drill, Zookeeper and Oozie
NO SQL Databases: HBase, Cassandra, MongoDB
J2EE/Middleware: J2EE (Servlets 2.4, JSP 2.0, JDBC, JMS)
Database: Microsoft SQL Server, MySQL, Oracle, DB2
Cloud Computing Tools: Amazon AWS
Development Tools: Microsoft SQL Studio, Eclipse, IntelliJ
Development Methodologies: Agile/Scrum, Waterfall
GUI Technologies: HTML, XHTML, CSS, JavaScript, Ajax, AngularJs
Web/App Servers: Web Logic, Web Sphere
Operating Systems: UNIX, Windows, Mac, LINUX
Office Suite: Microsoft Office (Word/Excel/PowerPoint)
PROFESSIONAL EXPERIENCE:
Confidential, San Francisco, CA
Hadoop Developer / Spark Developer
Responsibilities:
- Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
- Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
- Developed many Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning exercise.
- Worked on troubleshooting spark application to make them more error tolerant.
- Worked on fine-tuning spark applications to improve the over all processing time for the pipelines.
- Wrote Kafka producers to stream the data from external rest apis to Kafka topics.
- Wrote Spark-Streaming applications to consume the data from kafka topics and write the processed streams to HBase.
- Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient Joins, transformations and other capabilities.
- Worked extensively with Sqoop for importing data from Oracle.
- Experience working for EMR cluster in AWS cloud and working with S3.
- Involved in creating Hive tables, loading and analyzing data using hive scripts.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Good experience with continuous Integration of application using Jenkins.
- Used Reporting tools like Tableau to connect with Impala for generating daily reports of data.
- Collaborated with the infrastructure, network, database, application and BA teams to ensure data quality and availability.
Environment: Hadoop YARN, Spark-Core, Spark Streaming, Spark SQL, Scala, Kafka, Hive, Sqoop, Amazon AWS, HBase, Tableau, Oozie, Oracle, Linux
Confidential, Minneapolis, MN
Hadoop/Spark Developer
Responsibilities:
- Used Cloudera distribution extensively.
- Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL apis.
- Developed Spark Programs for Batch processing.
- Written new spark jobs in Scala to analyze the data of the customers and sales history.
- Worked on Spark SQL and Spark Streaming.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Used Kafka to get data from many streaming sources into HDFS.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Good experience in Hive partitioning, Bucketing and Collections perform different types of joins on Hive tables.
- Created Hive external tables to perform ETL on data that is generated on daily basics.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Written HBase bulk load jobs to load processed data to Hbase tables by converting to HFiles.
- Performed validation on the data ingested to filter and cleanse the data in Hive.
- Used Spark Sql with Scala for creating data frames and performed transformations on data frames.
- Created SQOOP jobs to handle incremental loads from RDBMS into HDFS and applied Spark transformations.
- Implemented Spark SQL to access hive tables into spark for faster processing of data.
- Loaded the data into hive tables from spark and used parquet columnar format.
- Developed oozie workflows to automate and productionize the data pipelines.
Environment: s: Hadoop, Hive, Flume, Shell Scripting, Java, Eclipse, HBase, Kafka, Spark, Spark Streaming, Scala, Oozie, HQL/SQL, Teradata.
Confidential, Chicago, IL
Hadoop Developer
Responsibilities:
- Developed Map Reduce programs for data extraction, transformation and aggregation.
- Monitor and troubleshoot Map Reduce Jobs those are running on the cluster.
- Implemented solutions for ingesting data from various sources and processing the data utilizing hadoop services like Sqoop, Hive, Pig, Sqoop, HBase, Map reduce, etc.
- Worked on creating Combiners, Partitioners and Distributed cache to improve the performance of Map Reduce jobs.
- Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
- Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
- Optimization of Map reduce algorithms using combiners and partitioners to deliver the best results and worked on Application performance optimization for a HDFS cluster.
- Orchestrated many Sqoop scripts, Pig scripts, Hive queries using Oozie workflows and sub workflows.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers and pushed to HDFS.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Involved in debugging Map Reduce jobs using MRUnit framework and optimizing Map Reduce jobs.
- Involved in troubleshooting errors in Shell, Hive and Map Reduce.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Design and implement map reduce jobs to support distributed processing using Map Reduce, Hive and Apache Pig.
- Created Hive external tables on the map reduce output before partitioning, bucketing is applied on top of it.
Environment: Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Hbase, DB2, Flume, ESP, Oozie, Maven, Unix Shell Scripting.
Confidential, Eden Prairie, MN
Hadoop developer
Responsibilities:
- Communicating with business customers effectively to gather the required information for the project.
- Involved in loading data into HDFS from Teradata using Sqoop.
- Experienced in moving huge amounts of log file data from different servers
- Worked on implementing complex data transformations using MapReduce framework.
- Involved in generating structured data through MapReduce jobs and have stored them in Hive tables and have analyzed the results through Hive queries based on the requirements.
- Worked on performance improvement by implementing Dynamic Partitioning and Buckets in Hive and by designing managed and external tables.
- Worked on development of PIG Latin scripts and have used ETL tools and Informatica for some pre-aggregations
- Worked on MapReduce programs to cleanse and pre-process data from various sources.
- Worked on Sequence files and Avro files in map Reduce programs.
- Created Hive Generic UDF’s for implementing business logic. And have worked on incremental imports to Hive Tables.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
- Loaded processed data into HBase tables using HBase Java api calls.
Environment: Hadoop, Cloudera, MapReduce, Hive, Impala, Pig, HBase, Sqoop, Flume, Oozie, Java, Maven, RHEL and UNIX Shell
Confidential, Knoxville, TN
Hadoop Developer
Responsibilities:
- Collaborated with different teams for Cluster Planning, Hardware requirement, network equipment’s to implement 9 node Hadoop cluster using Cloudera distribution.
- Involved in implementation and ongoing administration of Hadoop infrastructure.
- Screening of Hadoop cluster Job performances and Cluster capacity planning.
- Worked on analyzing Hadoop stack and developed multiple poc’s using MapReduce, Pig, Hive, HBase, Sqoop, Flume.
- Good understanding of AWS (amazon web services) EC2, RDS & S3
- Implemented commissioning and decommissioning of data nodes, killing the unresponsive task tracker and dealing with blacklisted task trackers.
- Resolving tickets submitted by users, troubleshoot the documented errors, resolving the errors.
- Involved in creating Hive tables, loading and analyzing data using hive queries.
- Dumped the data from one cluster to another cluster by using DistCp(Distributed copy ).
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre- processing with Pig.
- Implemented a script to transmit information from Oracle to HBase and Cassandra using Sqoop.
- Assisted in exporting analyzed data to NoSQL DB's Cassandra and HBase using Sqoop.
- Worked on tuning the performance of Hive and Pig queries.
- Performance tuning of Hadoop clusters and Hadoop Map Reduce routines.
- Manage and review Hadoop log files.
- Involved in HDFS maintenance and loading of structured and unstructured data from Linux machines, wrote MapReduce jobs using Java API and Pig Latin as well.
- Monitor Hadoop cluster connectivity and security.
- Implemented Fair scheduler on the Job tracker to share the resources of the Cluster for the MapReduce jobs given by the user.
- Worked with application teams to install OS, Hadoop updates, patches, versions upgrade as required.
- Aligning with the system engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
- Good with Java to write MapReduce business logics, and UDF's for PIG and HIVE
Environment: Cloudera Distributed Hadoop(CDH4), HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, HBase, Oozie, Impala, KafkaConfidential
Java Developer
Responsibilities:
- Created Stored procedures using PL-SQL for data modification (Using DML insert, update, delete) in Oracle
- Involved in Requirement Analysis, Development and Documentation.
- Used MVC architecture (Jakarta Struts framework) for Web tier.
- Participated in developing form-beans and action mappings required for struts implementation and validation framework using struts.
- Development of front-end screens with JSP Using Eclipse.
- Involved in Development of Medical Records module. Responsible for development of the functionality using Struts and EJB components.
- Coding for DAO Objects using JDBC (using DAO pattern).
- XML and XSDs are used to define data formats.
- Implemented J2EE design patterns value object singleton, DAO for the presentation tier, business tier and Integration Tier layers of the project.
- Involved in Bug fixing and functionality enhancements.
- Designed and developed excellent Logging Mechanism for each order process using Log4j.
- Involved in writing OracleSQL Queries.
- Involved in Check-in and Checkout process using CVS.
- Created SAP Business Objects Reports.
- Developed additional functionality in the software as per business requirements.
- Involved in requirement analysis and complete development of client-side code.
- Followed Sun standard coding and documentation standards.
- Participated in project planning with business analysts and team members to analyze the Business requirements and translated business requirements into working software.
- Developed software application modules using disciplined software development process.
Environment: Java, J2EE, JSP, EJB, ANT, STRUTS1.2, Log4j, Weblogic 7.0, JDBC, Eclipse, Windows XP, CVS, Oracle, SAP Business Objects, Netezza
Confidential
Java Developer
Responsibilities:
- Involved in Presentation Tier Development using JSF Framework and ICE Faces tag Libraries.
- Involved in business requirement gathering and technical specifications.
- Implemented J2EE standards, MVC2 architecture using JSF Framework.
- Implementing Servlets, JSP and Ajax to design the user interface.
- Extensive experience in building GUI (Graphical User Interface) using JSF and ICE Faces.
- Developed Rich Enterprise Applications using ICE Faces and Portlets technologies.
- Experience using ICE Faces Tag Libraries to develop user interface components.
- All the Business logic in all the modules is written in core Java.
- Wrote Web Services using SOAP for sending and getting data from the external interface.
- Developed a web-based reporting for monitoring system with HTML and Tiles using Struts framework.
- Middleware Services layer is implemented using EJB (Enterprise Java Bean - stateless) in WebSphere environment.
- Funds Transfers are sent to another application using JMS technology asynchronously.
- Involved in implementing the JMS (Java messaging service) for asynchronous communication.
- Created Stored procedures using PL-SQL for data modification (Using DML insert, update, delete) in Oracle
Environment: J2EE, EJB, JSF, ICE Faces, EJB, Web Services, XML, XSD, Agile, Microsoft Visio, Clear Case, Oracle 9.i/10.g, Weblogic8.1/10.3,RAD, Log4j,Servlets, JSP, Unix.