Hadoop/ Spark Developer Resume
New York City, NY
PROFESSIONAL SUMMARY:
- IT professional with 8+ years of experience in software design, development, deployment and maintenance of data analytic applications in fields of health, insurance, finance and retail sectors.
- 5+ years of experience in building high performance Big Data applications primarily using Hadoop eco - system tools and Spark framework.
- Solid understanding of architecture of Hadoop framework eco-system components Map Reduce, Pig, Hive, HBase, Flume, Sqoop, Hue, Oozie, Spark and Kafka.
- Good handson experiencing working with various hadoop disrtibutions mainly Cloudera (CDH), Hortonworks (HDP) and Amazon EMR.
- Expertise in developing production ready Spark applications utilizing Spark-Core, Data frames, Spark-SQL, Spark-ML and Spark-Streaming API's.
- Strong experience troubleshooting failures in spark applications and fine-tuning for better performance.
- Experience in using D- Streams in spark streaming, accumulators , Broadcast variables , various levels of caching and optimization techniques in spark.
- Strong experience working with both batch and real-time processing using Spark framework.
- Worked extensively on Hive for building complex data analytical applications.
- Strong experience writing complex map-reduce jobs including development of custom Input Formats and custom Record Readers.
- Sound Knowledge in map side join, reduce side join, shuffle & sort, distributed cache, compression techniques, multiple Hadoop Input & output formats.
- Worked extensively on Sqoop for performing bulk and incremental ingestion of large datasets from Teradata to HDFS.
- Good experience working with AWS Cloud services like S3, EMR, Redshift, Athena etc.,
- Deep understanding of performance tuning , partitioning for optimizing spark applications.
- Worked on building real time data workflows using Kafka, Spark streaming and HBase.
- Extensive knowledge on NoSQL databases like HBase, Cassandra and Mongo DB.
- Strong working experience in extracting, wrangling, ingestion, processing, storing, querying and Analyzing structured, semi-structured and unstructured data.
- Solid understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
- Solid experience in working with csv, text, sequential, Avro, parquet, orc, JSON formats of data.
- Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
- Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading and storing of data.
- Created Talend Mappings to populate the data into dimensions and fact tables.
- Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza.
- Experienced in job workflow scheduling and monitoring tools like Oozie.
- Proficient knowledge and hands on experience in writing shell scripts in Linux.
- Expertise in complete Java Package, object oriented design.
- Developed core modules in large cross-platform applications using JAVA , JSP , Servlets , Hibernate , RESTful , JDBC , JavaScript , XML , and HTML .
- Extensive experience in developing and deploying applications using Web Logic , Apache Tomcat and JBOSS .
- Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc.
- Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).
TECHNICAL SKILLS:
Big Data Technologies: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Hue, Ambari, Zookeeper, Kafka, Apache Spark, Storm
Hadoop Distributions: CDH, HDP, AWS EMR
Languages: Java, Scala, Python, SQL, Pig Latin, Hive QL
IDE Tools: Eclipse, NetBeans, IntelliJ.
Web Technologies: HTML, CSS, JavaScript, XML, JSP, RESTful.
Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS
Reporting Tools /ETL Tools: Tableau, Power view for Microsoft Excel, Talend
Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (HBase, Cassandra, MongoDB)
Build Automation tools: SBT, Ant, Maven
PROFESSIONAL EXPERIENCE:
Confidential, New York City, NY
Hadoop/ Spark Developer
Responsibilities:
- Developed custom input adapters in Java for moving the data from raw sources (FTP, S3) to HDFS.
- Developed Spark applications using Scala to perform data cleansing, data validation, data transformations and other enrichments.
- Worked extensively on making the spark applications production ready by implementing possible best practices, to make them highly scalable and fault tolerant.
- Developed many Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning exercise.
- Worked on troubleshooting spark application to make them more error tolerant.
- Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
- Wrote Kafka producers to stream the data from external rest API to Kafka topics.
- Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
- Utilized Data frames and Spark SQL API extensively where ever needed.
- Data pipeline consists Sqoop, custom build Input Adapters, Spark and Hive.
- Worked on performing Hive modeling and written many hive scripts to perform various kinds of data preparations that are needed for running machine learning models.
- Worked closely with the data science team in automating and production analyzing various models like logistic regression, k-means using spark-ml.
- Worked on working prototype to build a real time workflow for streaming the user events from external applications.
- Utilized Kafka and Spark Streaming for building the real time pipeline.
- Worked on converting existing map-reduce jobs to spark jobs.
- Developed Oozie workflows to automate and productionize the data pipelines.
- Implemented Logistic Regression and K-Means models and automated them to run in production.
Environment: Cloudera Distribution, Hadoop, HDFS, Spark, Scala, Kafka, HBase, Oozie, Hive, Flume, Sqoop, Java, SQL, Oracle 11g, Unix/Linux
Confidential, Bethlehem, PA
Hadoop Developer
Responsibilities:
- Part of Big Data Center of Excellence(CoE), responsible for designing and building enterprise data analytics solutions.
- Worked with respective business units in understanding the scope of the analytics requirements.
- Performed core ETL transformations in Spark.
- Automated data pipelines which involve data ingestion, data cleansing, data preparation and data analytics.
- Created end to end Spark applications using Scala to perform various data cleansing, validation, transformation and summarization activities on user behavioral data.
- Converted existing MapReduce jobs into Spark transformations and actions using Spark RDDs, Data frames and Spark SQL API
- Developed end-to-end data pipeline using FTP Adaptor, Spark, Hive and Impala.
- Used Scala to write code for all Spark use cases.
- Implemented design patterns in Scala for the application.
- Implemented Spark using Scala utilized Spark SQL heavily for faster development, and processing of data.
- Exploring with Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
- Involved in converting Hive/SQL queries into Spark transformations using Spark SQL and Scala.
- Handled importing other enterprise data from different data sources into HDFS using Sqoop and performing transformations using Hive, Map Reduce and then loading data into HBase tables.
- Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Created components like Hive UDFs for missing functionality in HIVE for analytics.
- Worked on various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in Hive and Map Side joins.
- Created Oozie workflows and coordinators to automate data pipelines daily, weekly and monthly.
- Automated Cluster creation and termination in AWS.
Environment: Horton Works, HDFS, Hive, Sqoop, Flume, Spark, Scala, HBase, Kafka, Impala, Oozie, Oracle 11g, YARN, UNIX Shell Scripting, Agile Methodology
Confidential, Minnetonka, MN
Hadoop Developer
Responsibilities:
- Worked with the business team to gather the requirements and participated in the Agile planning meetings to finalize the scope of each development.
- Responsible for building scalable distributed data solutions on CDH.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Implemented data pipelines developing multiple mappers by using Chained Mappers API.
- Developed multiple MapReduce batch jobs in java for loading the data to HDFS in sequential format.
- Ingested structured data from wide array of RDBMS to HDFS as incremental import using Sqoop.
- Involved in writing Pig scripts to wrangle the raw data and store it to HDFS, load the data to Hive tables using HCatalog.
- Configured Flume agents on different data sources to capture the streaming log data from the web servers.
- Implemented Flume (Multiplexing) to steam data from upstream pipes in to HDFS.
- Created Hive external tables with clustering and partitioning on the date for optimizing the performance of ad-hoc queries.
- Involved in writing Hive QL scripts on beeline, impala, hive cli for the consumer data analysis to meet business requirements.
- Exported data from Hive to DWH using Sqoop.
- Worked with different file formats and compression techniques to ensure optimal performance of hive queries.
- Involved in creating Hive tables from wide range of data formats like csv, text, sequential, Avro, parquet, orc, JSON and custom formats using SerDe .
- Transformed the semi-structured log data to fit into the schema of the Hive tables using Pig.
- Involved in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Involved in testing and designing low level and high-level documentation for the business requirement.
Environment: Cloudera Hadoop, Eclipse, Map Reduce, java, Sqoop, Pig, Oozie, Hive, Flume, Cent OS, MySQL, Oracle DB.
Confidential, Overland Park, KS
Hadoop Engineer
Responsibilities:
- Imported Data from different Relational Data Sources like Teradata, Oracle to HDFS using Sqoop.
- Imported Bulk Data into HBase tables using Map Reduce programs.
- Inserted time series data in HBase using HBase Java Api.
- Designed and implemented Incremental Imports into Hive tables.
- Used Rest ApI to Access HBase data to perform analytics.
- Worked in loading and transforming large sets of structured, semi structured and unstructured data
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way
- Experienced in managing and reviewing the Hadoop log files.
- Migrated ETL jobs to Pig scripts do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Involved in Unit testing and delivered Unit test plans and results documents using Junit and MRUnit.
- Exported data from HDFS environment into RDBMS using Sqoop for report generation and visualization purpose.
- Worked on Oozie workflow engine for job scheduling.
Environment: Hadoop, HDFS, Map Reduce, Hive, Teradata, Oozie, Sqoop, Pig, Java, Rest API, Maven, MRUnitJunit.
Confidential, Richfield, MN
Java/ Hadoop Developer
Responsibilities:
- Developed web applications by coordinating requirements, user stories, use cases, screen mockups, schedules, and activities.
- Work closely with client business stakeholders on Agile development teams.
- Support users by developing documentation and assistance tools.
- Developed presentation using Spring Framework and used multiple modules in Spring like, Spring MVC, JDBC
- Implemented Web-Services to integrate between different applications components using RESTful using Jersey.
- Developed RESTful Web services for transmission of data in JSON/XML format.
- Involved in writing SQL queries, functions, views, triggers and stored procedures and also using Oracle relational database.
- Used Sqoop to ingest structured data from Oracle database to HDFS.
- Involved in writing and running Map Reduce batch jobs using java for data wrangling on the cluster.
- Developed map side, reduce side joins using Distributed Cache on various data sets.
- Developed Pig Latin scripts to transform the data according to the business requirement.
- Developed Pig UDFs extending eval, filter functions using java to filter semi structured data.
Environment: Java, J2EE, Eclipse, JSP, Servlets, Spring, JavaScript, HTML, RESTful, shell scripting, XML, Oracle 10g, Cloudera Hadoop, Map Reduce, Pig, HDFS.
Confidential
Java/ J2EE Developer
Responsibilities:
- Involved in Analysis, design and development of web applications based on J2EE.
- Struts framework is used for managing the navigation and page flow.
- Developed the EJB-Session Bean acts as Facade, will be able to access the business entities through their local home interfaces.
- Designed the user interface using HTML, CSS, java Script and JQuery
- Used Log4j to debug and generate new logs for the application.
- Used JDBC for accessing the data from the Oracle database. Created database tables, stored procedures using PL/SQL in Oracle DB.
- Validation on Web Forms, for client-side validation as per the requirement.
- Experienced in developing code to convert JSON data to Customize JavaScript objects.
- Developed Servlets and JSPs based on MVC pattern using Struts framework.
- Provided support for Production and Implementation Issues. Involved in end-user/client training of the application.
- Performed Unit Tests on the application to verify and identify various scenarios.
- Used Eclipse for development, Testing, and Code Review.
- Involved in the release management process to QA/UAT/Production regions.
- Used Maven tool for building application EAR for deploying on Web Logic Application servers.
- Developed of the project in the agile environment.
Environment: J2EE, Java, Eclipse, EJB, Java Beans, JDBC, JSP, Struts, Design Patterns, BEA WebLogic, PL/SQL, DB2, UML, CVS, JUnit, Log4j.
Confidential
Java Developer
Responsibilities:
- Involved in various phases of Software Development Life Cycle (SDLC) of the application like Requirement gathering, design, development and documentation.
- The application is designed using J2EE design patterns and technologies based on MVC architecture
- Involved in designing the user interfaces using HTML, CSS and JavaScript for client-side validation.
- Developed custom tags, JSTL to support custom User Interfaces.
- Handled business logic as a Model using the helper classes and Servlets to control the flow of application as controller as server-side validations.
- Involved in Servlets, Java Bean programming on the server side for the communication between clients and server.
- Involved in Implemented Web-Services to integrate between different applications (internal and third party components) using Restful services.
- Involved in writing unit testing for doing positive and negative test cases.
- Developed the maven scripts for preparing WAR files used to deploy J2EE components.
- Created tables, views, triggers, stored procedures on MySQL server for data manipulation and retrieval.
- Used JDBC to invoke Stored Procedures and for database connectivity to database server.
- Used Log4J to capture the log that includes runtime exceptions.
- Involved in Bug fixing and functionality enhancements.
- Developed the project using Waterfall model.
Environment: J2EE, Java, UNIX, red-hat, Putty, MVC, JSP, JDBC, Eclipse IDE, Apache Tomcat, CSS, HTML, JavaScript, SQL Server.
