We provide IT Staff Augmentation Services!

Sr. Hadoop / Spark Developer Resume

4.00/5 (Submit Your Rating)

Bellevue, WA

SUMMARY:

  • 8+ years of Industry experience in Software development, Data Analysis and Hadoop Ecosystem technologies.
  • Solid understanding of Data Modeling, Evaluating Data Sources and strong understanding of Data Warehouse/Data Mart Design, ETL, BI, OLAP, Client/Server applications.
  • Experience in analyzing data using Hadoop Ecosystem including HDFS, Hive, Spark, Spark Streaming, Elastic Search, Kibana, Kafka, HBase, Zookeeper, PIG, Sqoop, Flume.
  • Strong experience in Data Extraction, Transforming and Loading (ETL) from multiple sources like Excel, MS Access, XML, Oracle and DB2 to MS SQL Server using SQL Server Integration Services (SSIS), DTS, Bulk Insert and BCP.
  • Experience working with data modeling tools like Erwin, Power Designer and ER Studio.
  • Experience in developing MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Working experience on designing and implementing complete end - to-end Hadoop Infrastructure including PIG, HIVE, Sqoop, Oozie, Flume and zookeeper .
  • Experience in working with various Cloudera distributions (CDH4/CDH5 ), Hortonworks and Amazon EMR Hadoop Distributions.
  • Experience in automating the Hadoop Installation, configuration and maintaining the cluster by using the tools like Puppet .
  • Used Spark API over Hortonworks Hadoop YARN to perform analytics on data in Hive.
  • Experience in designing Star schema, Snowflake schema for Data Warehouse, ODS architecture.
  • Experience in Requirement gathering, System analysis, handling business and technical issues & communicating with both business and technical users.
  • Designed the Data Marts in dimensional data modeling using star and snowflake schemas.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and from RDBMS to HDFS.
  • Experience in data analysis using Hive, Pig Latin, and Impala.
  • Experience in Hadoop Distributions like Cloudera, Hortonworks, Big Insights, MapR, Windows Azure, and Impala.
  • Well versed in Normalization / De-normalization techniques for optimum performance in relational and dimensional database environments.
  • Experienced in various Teradata utilities like Fastload, Multiload, BTEQ, and Teradata SQL Assistant.
  • Excellent Software Development Life Cycle (SDLC) with good working knowledge of testing methodologies, disciplines, tasks, resources and scheduling.
  • Excellent knowledge in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
  • Experience in designing error and exception handling procedures to identify, record and report errors.
  • Excellent knowledge on Perl & UNIX.
  • Have good exposure on working in offshore/onsite model with ability to understand and/or create functional requirements working with client Excellent in creating various artifacts for projects which include specification documents, data mapping and data analysis documents.
  • An excellent team player & technically strong person who has capability to work with business users, project managers, team leads, architects and peers, thus maintaining healthy environment in the project.

TECHNICAL SKILLS:

Analysis and Modeling Tools: Erwin 9.6/9.5, Sybase Power Designer, Oracle Designer, ER/Studio 9.7

Database Tools: Microsoft SQL Server 2014/2012 Teradata 15/14, Oracle 12c/11g, MS Access, Poster SQL, Netezza.

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9.

ETL Tools: SSIS, Pentaho, Informatica Power 9.6, SAP Business Objects XIR3.1/XIR2, Web Intelligence

Operating System: Windows, Dos, Unix

Reporting Tools: Business Objects, Crystal Reports

Web technologies: HTML, DHTML, XML, JavaScript

Tools & Software’s: TOAD, MS Office, BTEQ, Teradata SQL Assistant

Big Data: Hadoop, HDFS 2, MapReduce, YARN, Hive, Spark-SQL, PIG, HBase, Sqoop, Kafka, Oozie, Flume.

AWS: AWS, EC2, S3, SQS.

Other tools: TOAD, SQL PLUS, SQL LOADER, MS Project, MS Visio and MS Office, have worked on C++, UNIX, PL/SQL, Docker etc.

Languages: Java, Scala, Python

PROFESSIONAL EXPERIENCE

Confidential, Bellevue, WA

Sr. Hadoop / Spark Developer

Responsibilities:

  • Responsible for Big Data initiatives and engagement including analysis, brainstorming, POC, and architecture.
  • Involved in complete project life cycle starting from design discussion to production deployment .
  • Analyzed database systems to provide solutions and recommendations for General Data Protection Regulation (GDPR )/ data privacy regulations. These tasks included SQL , Oracle, MySQL, NoSQL , and Hadoop.
  • Developed Spark applications using Scala utilizing Data frames and Spark SQL API for faster processing of data.
  • Developed highly optimized Spark applications to perform various data cleansing, validation, transformation and summarization activities according to the requirement .
  • Data pipeline consists Spark, Hive and Sqoop and Custom Build Input Adapters to ingest, transform and analyze operational data.
  • Developed Spark jobs and Hive Jobs to summarize and transform data.
  • Worked on Cassandra to maintain larger data and versions.
  • Involved in the process of data modelling Cassandra Schema .
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark Data Frames and Scala.
  • Analyzed the SQL scripts and designed the solution to implement using Scala.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data.
  • Created applications using Kafka , which monitors consumer lag within Apache Kafka clusters. Used in production by multiple report suites.
  • Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
  • Created RDD ’s and applied data filters in Spark and created Cassandra tables and Hive tables for user access.
  • Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team.
  • Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis .
  • Designed and implemented Hive data warehouse system in Horton works environment .
  • Analyzed the data by performing Hive queries (Hive QL) to study customer behavior.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
  • Created HBase tables and column families to store the user event data.
  • Scheduled and executed workflows in Oozie to run various jobs.

Environment: Hadoop, HDFS, YARN, MapReduce, Hive, PIG, Spark, Scala, Flume, Kafka, Sqoop, Oozie, ER Studio 9.7, Teradata 16, Oracle 12c, Cassandra, GitHub

Confidential, Bellevue, WA

Hadoop Developer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS , Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Importing and exporting data into HDFS and Hive using Sqoop .
  • Experience in defining job flows using Oozie and shell scripts.
  • Experienced in implementing various customizations in MapReduce at various levels by implementing custom input formats, custom record readers, partitioners, and data types in java.
  • Experience in ingesting data using flume from web server logs and telnet sources.
  • Installed and configured Cloudera Manager, Hive , Pig, Sqoop , and Oozie on CDH5 cluster.
  • Experienced in managing disaster recovery cluster and responsible for data migration and backup.
  • Performed an upgrade in development environment from CDH 4 .x to CDH 5. x.
  • Implemented encryption and masking on customer sensitive data in flume by building a custom interceptor and masking and encrypting the data as per the requirement by considering the rules in MySQ L.
  • Experience in managing and reviewing Hadoop log files.
  • Extracted files from RDBMS through Sqoop and placed in HDFS and processed.
  • Experience in running Hadoop streaming jobs to process terabytes of xml format data.
  • Supported MapReduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS .
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
  • Experiences in implementing Hive-HBase integration by creating hive external tables and using HBase storage handler.
  • Executed queries using Hive and developed MapReduce jobs to analyze data.
  • Developed Pig Latin scripts to extract the data from the web server output files to load into HDFS.
  • Developed Hive queries for the analysts.
  • Involved in loading data from LINUX and UNIX filesystem to HDFS.
  • Designed and implemented MapReduce based large scale parallel relation learning system.
  • Developed Master tables in HIVE using a "jsondeserializer" or "get json object" or "json tuple" functions of HIVE.
  • Designed the entire flow in HDFS in such a way that is needed to be achieved using Oozie workflows.

Environment: Java, Eclipse, Hadoop, Hive, HBase, MapReduce, HDFS, PIG, Hive, Cassandra, Java (JDK 1.6), Hadoop Distribution of Cloudera, UNIX Shell Scripting.

Confidential, Atlanta, GA

ETL/Java Developer

Responsibilities:

  • Created logical data model from the conceptual model and validated the model with response to questionnaire from business analyst and its conversion into the physical database design.
  • Used Python Library Beautiful Soup for web scrapping.
  • Have used SQL queries to perform Data Analysis and Data Profiling.
  • Used Python scripts to update the content in database and manipulate files.
  • Involved in Big Data Analytics and Massively Parallel Processing (MPP) architectures like Green Plum and Teradata.
  • Experience in writing test cases in java environment using JUnit.
  • Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from Teradata database.
  • Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
  • Expertise and worked on Physical, logical and conceptual data model
  • Experience in creating UNIX scripts for file transfer and file manipulation.
  • Validating the data passed to downstream systems.
  • Collaborated with other data modeling team members to ensure design consistency and integrity
  • Developed Java Beans and Utility Classes for interacting with the database using JDBC.
  • Manually "Ingested" (registered and processed) incoming video files and metadata from multiple company
  • Perform data reconciliation between integrated systems.
  • Assisted in the oversight for compliance to the Enterprise Data Standards, data governance and data quality

Environment: PL/SQL, Tableau, ETL Tools Informatica 9.5/9.1 Oracle 11G, Teradata R14, Teradata SQL Assistant 14.0, Informatica 8.1, Data Flux, Oracle 9i, Quality Center 8.2, SQL, TOAD, Flat Files, Python, Netezza.

Confidential, Chicago, IL

ETL/Java Developer

Responsibilities:

  • Gathered requirements, analyzed and wrote the design documents.
  • Prepared High Level Logical Data Models using Erwin, and later translated the model into physical model using the Forward Engineering technique.
  • Involved in Data mapping specifications to create and execute detailed system test plans. The data mapping specifies what data will be extracted from an internal data warehouse, transformed and sent to an external entity.
  • Analyzed business requirements, system requirements, data mapping requirement specifications, and responsible for documenting functional requirements and supplementary requirements in Quality Center.
  • Setting up of environments to be used for testing and the range of functionalities to be tested as per technical specifications.
  • Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
  • Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Delivered file in various file formatting system (ex. Excel file, Tab delimited text, Coma separated text, Pipe delimited text etc.)
  • Perform data reconciliation between integrated systems.
  • Metrics reporting, data mining and trends in helpdesk environment using Access
  • Create and Monitor workflows using workflow designer and workflow monitor.
  • Involved in extensive DATA validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Identify & record defects with required information for issue to be reproduced by development team.

Environment: PL/SQL, Business Objects XIR2, ETL Tools Informatica9.5/8.6/9.1 Oracle 11G, Teradata V2R12/R13.10, Teradata SQL Assistant 12.0, DB2, Java, Business Objects, SQL, SQL Server 2000/2005, UNIX, Shell Scripting, Quality Center 8.2

Confidential

ETL Developer

Responsibilities:

  • Involved in Data mapping specifications to create and execute detailed system test plans. The data mapping specifies what data will be extracted from an internal data warehouse, transformed and sent to an external entity.
  • Analyzed business requirements, system requirements, data mapping requirement specifications, and responsible for documenting functional requirements and supplementary requirements in Quality Center.
  • Tested Complex ETL Mappings and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
  • Wrote and executed unit, system, integration and UAT scripts in a data warehouse project.
  • Developed separate test cases for ETL process (Inbound & Outbound) and reporting.
  • Involved in Teradata SQL Development, Unit testing and Performance Tuning and to ensure testing issues are resolved on the basis of using defect reports.
  • Tested the ETL process for both before data validation and after data validation process. Tested the messages published by ETL tool and data loaded into various databases.
  • Document and publish test results, troubleshoot and escalate issues.
  • Preparation of various test documents for ETL process in Quality Center.
  • Involved in Test Scheduling and milestones with the dependencies.

Environment: Informatica 8.1, Data Flux, Oracle 9i, Quality Center 8.2, SQL, TOAD, PL/SQL, Flat Files, Teradata, Windows XP, Informatica Power Center 6.1/7.1, QTP 9.2, Test Director 7.x, Load Runner 7.0, Oracle 10g, UNIX AIX 5.2, PERL, Shell Scripting

We'd love your feedback!