We provide IT Staff Augmentation Services!

Sr. Hadoop/spark Developer Resume

3.00/5 (Submit Your Rating)

Battle Creek, MI

SUMMARY

  • Having around 3 plus years of experience in Big data Hadoop, Hadoop Ecosystem components like MapReduce, Sqoop, Flume, Kafka, Pig, Hive, Spark, Storm, Hbase, Oozie, and Zookeeper.
  • Worked extensively on installing and configuring Hadoop ecosystem components like Hive, Sqoop, Pig, HBase, Zookeeper and Flume.
  • Hands - on experience on YARN (MapReduce 2.0) architecture and components such as Resource Manager, Node Manager, Container and Application Master and execution of a MapReduce job.
  • Well versed with Design and Architecture principles to implement Big Data Systems.
  • Implemented ETL operations using BigData platform.
  • Developed Data pipe lines and applying business logics using Spark.
  • Experienced in integrating Kafka with Spark streaming for high speed data processing.
  • Exposure to Data Lake Implementation using Apache Spark.
  • Exploring with theSparkfor improving the performance and optimization of the existing algorithms inHadoopusingSparkContext,Spark-SQL, Data Frame, Pair RDD’s,SparkYARN.
  • Developed data pipeline using kafka and Spark streaming for High speed System processing.
  • Working with the data extraction, transformation and loading in Hive, Pig and HBase.
  • Well versed with creating HBase tables to store variable data formats coming from third party sources.
  • Performed different ETL operations using Pig.
  • Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, CSV formats.
  • Hands on experience in working with Sequence files, RC files, Dynamic Partitions, Bucketing for best practice and performance improvement.
  • Experience in designing different time driven and data driven automated workflows using Oozie.
  • Experience in configuring the Zookeeper to coordinate the servers in clusters and to maintain the data consistency.
  • Experience in extracting source data from Sequential files, XML, JSON and other file formats and transforming and loading it into the target Data warehouse using Sqoop with Bash Scripts.
  • Hands on experience in configuring and working with Flume to load data from multiple sources directly into HDFS.
  • Very Good understanding of SQL, ETL and Data Warehousing Technologies.
  • Experience in Teradata utilities, Aggregates, views, Teradata SQL and UNIX.
  • Good knowledge in Teradata concepts and Teradata utilities.
  • Familiar with BTEQ, Fast Load and Fast Export scripts.
  • Implemented Performance Tuning at various levels.
  • Have used Explain and Collect Statistics.
  • Expertise in relational databases like Oracle, My SQL and SQL Server.
  • Experience in implementing projects both in Agile and Waterfall methodologies.
  • Well versed with Sprint ceremonies that are practice in Agile methodology.
  • Strong Experience on Data Warehousing ETL concepts using Informatica Power Center, Tableau, OLAP, OLTP and AutoSys.
  • Highly involved in all phases of SDLC with analysis, Design, Development, Integration, Implementation, Debugging, and Testing of Software Applications in a client server environment.
  • Strong analytical and problem solving skills, highly motivated, good team player with very GOOD communication & interpersonal skills.

TECHNICAL SKILLS

Hadoop Ecosystem Components: Hadoop, MapReduce, Hive, Pig, YARN, Kafka, FLUME, Sqoop, Impala, Oozie, HBase, Zookeeper, Spark.

Hadoop Distributions: Cloudera (CDH3, CDH4, AND CDH5), Hortonworks.

Methodologies: Agile, Waterfall.

Languages: SQL, C#, VB.NET, Python, Scala, Java, HTML, XML and C/C++.

Databases: Teradata, Oracle, DB2, MS-SQL Server, MySQL, MS-Access.

NoSQL Databases: HBase, Cassandra, and MongoDB.

ETL and Reporting Tools: Informatica, Talend, Tableau.

Tools: and Utilities: SQL assistant, BTEQ, FAST LOAD, MULTI LOAD, FAST EXPORT.

Build Tools: Maven, Ant, Sbt.

Version Control Tools: SVN, Git, GitHub

Operating Systems: RedHat Linux, Unix, Windows, Linux.

PROFESSIONAL EXPERIENCE

Confidential, Battle Creek, MI

Sr. Hadoop/Spark Developer

Responsibilities:

  • Ingested data from RDBMS to HDFS using SQOOP.
  • Created Hive tables to store various data formats coming from different portfolios.
  • Responsible to manage coming data from different data sources.
  • Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
  • Extensively used Hive and Pig for data Cleansing.
  • Developed multiple Kafka Producers and Consumers for spark streaming.
  • Support development with application architecture in both real time and batch big data processing
  • Developed Spark scripts by using Scala shell.
  • Developed UDF’s in java for enhancing functionalities of Hive scripts.
  • Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
  • Experience designing and executing time driven and data driven Oozie workflows.
  • Created secondary index for joining multiple HBase tables.
  • Worked on HBase in creating HBase tables to load large sets of semi structured data coming from various sources.
  • Configured connection between Hive and Tableau using Impala for BI developer team.
  • Exported the analyzed data to Business Layer and generated reports using Tableaue.
  • Prepared technical design documents, detailed design documents.
  • Worked on different file formats (ORCFILE, TEXTFILE) and different compression Codes(GZIP, SNAPPY, BZIP).
  • Written MapReduce program for data validation.
  • Involved in performance tuning of spark jobs using Cache and using complete advantage of cluster environment.

Environment: Hadoop, Hive, Linux, Mapreduce, Sqoop, Kafka, Spark, HBase, Oozie, Shell Scripting, Scala, Maven, Java, Tableaue, MySQl, Teradata, Oracle, Agile Methodologies.

Confidential, Columbus, OH

Hadoop Developer

Responsibilities:

  • Involved in gathering and analyzing business requirements, and designing Hadoop stack as per the requirements.
  • Worked with Sqoop import and export functionalities to handles large data set transfer between Oracle database and HDFS.
  • Developed Unix shell scripts to load large number of files into HDFS from Linux File system.
  • Moved data from Relational Database using Sqoop into Hive dynamic partition tables and external tables.
  • Wrote Flume configuration files for importing streaming log data into HBase with Flume.
  • Responsible for performing extensive data validation using Hive.
  • Developed and executed shell scripts to automate the jobs.
  • Wrote complex Hive queries and UDFs.
  • Extensively using Pig for cleansing and pre-process the data for analysis.
  • Performed validation and standardization of raw data from XML and JSON files with Pig and MapReduce.
  • Implemented Map Reduce programs to classified data organizations into different classified based on different type of records.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Wrote MapReduce jobs for data processing and the result is stored in HBase for BI reporting.
  • Worked on Text files, Sequence files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
  • Implemented complex MapReduce programs to perform joins on Map side using Distributed Cache in Java.

Environment: Hadoop, MapReduce, HDFS, Hive, Sqoop, Flume, HBase, Pig, Java, SQL, UNIX, Shell Scripting, Oracle.

Confidential, Tampa, FL

Hadoop Developer

Responsibilities:

  • Worked on Sqoop for Import/Export data into HDFS and Hive.
  • Experience with Pig program for loading and filtering the streaming data into HDFS using Flume.
  • Moving large amount data into HBase using MapReduce Integration.
  • Experienced with MapReduce programs to clean and aggregate the data.
  • Developed HBase data model on top of HDFS data to perform real time analytics using Java API.
  • Worked on different kind of custom filters and handled pre-defined filters on HBase using API.
  • Developed counters on HBase data to count total records on different tables.
  • Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
  • Implemented secondary sorting to sort reducer output globally in MapReduce.
  • Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
  • Created Hive Dynamic partitions to load time series data.
  • Experienced in handling different types of joins in Hive like Map joins, bucker map joins, sorted bucket map joins.
  • Experienced import/export data into HDFS/Hive from relational data base and Tera data using Sqoop.
  • Handling continuous streaming data comes from different sources using Flume and set destination as HDFS.
  • Integrated spring schedulers with Oozie client as beans to handle Cron jobs.
  • Actively participated in software development lifecycle including design and code reviews.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.

Environment: HDFS, Map Reduce, Hive, Pig, Hbase, Sqoop, RDBMS/DB, Flat files, Mysql, CSV, Avro Data Files.

Confidential

Teradata/ETL Developer

Responsibilities:

  • Created scripts to data loading into staging server through fast export, later which is been cleansed transformed, integrated and loaded into database through, M load and BTEQ.
  • Developed F Load and M Load scripts in control file, developed BTEQ scripts to process the data in staging server.
  • Expertise in SQL and Performance tuning on large scaleTeradataDatabase
  • Enhancements, Maintenance and Development for DWH applications
  • Prepared Requirements documents, SRS documents, Unit Test Cases, Deployment Guides, User manuals, Technical documentation etc.
  • Coordinated the Change Requests,
  • Trouble shooting and maintaining regular status reports etc.,
  • Follow up with System Test teams, bug fixes and closing the defects
  • Through explain plan, analysed and modified indexes and modified queries with derived or temporary tables to improve the performance. Also utilizedTeradataviewpoint.
  • Analysed and designed USI and NUSI based on the columns used in joins during data retrieval.
  • Used BTEQ and SQL Assistant front-end tools to issue SQL commands matching the business requirements toTeradataRDBMS.
  • Developed physical models with appropriate Primary, Secondary, PPI and Join Index taking into consideration of both planned access of data and even distribution of data across all the available AMPs.
  • Developed workflows and automated with UNIX shell scripts.
  • Handled production issues by co-coordinating between client and development team.

Environment: UNIX,TeradataVR13, ETL, SQL Assistant, MLOAD, FLOAD, BTEQ, FASTEXPORT andTeradataUtilities.

Confidential

Teradata Developer

Responsibilities:

  • Worked on loading of data from several flat files sources to staging area usingTeradataUtilities.
  • Based on the End user and business users requirement changes, created and implemented the change requests and modified the code accordingly and tested and moved in to production and worked on maintenance of the jobs as well
  • Performed Data validations and verification of source system to data and target system data which has been processed and loaded.
  • Worked and coordinated with various source system teams, system analyst, testing teams and other vendors and attending regular triage meetings and discussions.
  • Written severalTeradataBTEQ scripts to implement the business logic.
  • Worked on Tuning, and troubleshooting, improving the performance of the scripts and load utilities and also tuned the queries to increase the processing speed
  • Performed Unit testing, Regression testing and Integration testing between various modules and systems
  • Assisted the Testing team members in developing scripts for Automated Testing and providing inputs for preparation of test plans and test case and helping them in the execution process.

Environment: Teradata(V2R5), Windows XP, DDL, BTEQ, FASTLOAD, MLOAD.

We'd love your feedback!