We provide IT Staff Augmentation Services!

Senior Hadoop Engineer/tech Lead/data Scientist Resume

5.00/5 (Submit Your Rating)

IL

SUMMARY

  • 10+ Years of IT experience in Data warehouse as a Senior Developer and Data solution architect in Hadoop 0.20/2.2/2.6 , CDH5.5/5.10, ETL Tools Datastage v 7.5/8.5/9.1/11.3 and Informatica Power centre 8.6/9.1/9.6.
  • 4+ Years of experience in using Hadoop and its ecosystem technologies (HDFS, SPARK, Python, Scala, Hive, PIG, Impala, SQOOP, OOZIE, YARN, MapReduce, HBase, FLUME).
  • Strong experience and knowledge in Spark,Scala, Python, data structures, data frames, Partitioning, performance techniques, RDD concepts.
  • Strong experience in understanding the business requirements, technical and existing business processes, legacy system operations.
  • Experience in using different file formats (Avro,parquet Sequence Files, ORC, JSON) and Hadoop performance tuning, optimization, customization, Dataset creation.
  • Extensive experience in data profiling, data migration from various legacy sources to OLAP/OLTP target applications
  • Strong Data Modeling experience and designed ER diagrams, 3NF, Dimensional/Hierarchical data models, Star and Snowflake schema data models using Erwin
  • Major Job responsibilities are to design and develop Physical and logical Data models to data extraction from source to target
  • Strong experience in Metadata management of ETL and using several transformations to perform Data profiling, Data Cleansing and Data Transformation. E2E Data setup for quality testing
  • Strong Oracle skills that include build/maintain tables and indexes, PL/SQL scripts, Stored Procedures, Triggers, Packages, Functions and performance tuning of SQL queries.
  • Experience in working with XML, Tableau, XSD, Web Services and MQ
  • Strong Experience in Trouble shooting bugs, L2 support in Hadoop, Informatica and Datastage job.
  • Experience in the assessment of existing platform architecture frameworks, evaluate and select software and hardware technologies, process definition, re - engineering and implementation of data strategy.
  • Strong Experience with Unix/Linux system, shell scripting, complex linux activities, understanding of approaches for business intelligence, data marts\data warehouse.
  • Good knowledge on other Big data platforms such as IBM BigInsights, Horton works.
  • Experience in doing POCs on Data analytics projects using R, Python.
  • Expertise in designing, development, requirements gathering, system analysis, technical documentation & flow charts, team management, test & data management, client relationship and product delivery
  • Strong expertise in the complete SDLC, ITIL service management and Production Support functions by handling key responsibilities as the Module Lead, Team Lead, Tech Lead over the years
  • Knowledge and experience in SOA, CMM, agile and waterfall methodologies
  • Experienced in coordinating, communicating with outside vendors and off-shore resources in support of time line and IT project deliverables

TECHNICAL SKILLS

Big Data: Hadoop, HDFS, Hive, Sqoop, PIG, Oozie, Impala, Map Reduce, HBase, Spark

ETL Technology: Informatica Power Center 9.6/9.1/8.6/7.1 and Datastage 7.5/8.5/ 9.1/11.3.

Data Warehouse: Multidimensional Model Design, Star Schema Development

Data Modeling: Star-Schema Modeling, Snowflake Modeling, Erwin, IBM Quality stage

Operating System: MVS OS/390, UNIX, Windows, LINUX, UNIX, SOLARIS

Databases: Oracle 8i/9i/10g/11g, Teradata, MS SQL Server 2008/2012,MySQL

Tools: Autosys, Qlik sense/QlikView, Tableau, SQL*plus, Toad, Control M, TD SQL assist

Languages: Python, Scala, Java, SQL, UNIX Shell Scripting, PL/SQL.

PROFESSIONAL EXPERIENCE

Confidential, IL

Senior Hadoop Engineer/Tech Lead/Data Scientist

Environment: CDH5.5.1/5.10, Hadoop 2.6.0, Python, Spark, Scala, Hive, Apache Pig, SQOOP, Oozie, Java, Impala, YARN, HBase, Pheonix, R, Unix, Oracle,Teradata, MySQL, Solr, Autosys.

Responsibilities:

  • Understanding the business requirements, current technology challenges & issues.
  • Preparing Design Documents (Request-Response Mapping Documents, Hive Mapping Documents), Data Mapping documents.
  • Implementing the design using Scala, Spark, Python, Hive, Hbase,Pig,Java, Sqoop scripts.
  • Involved in migrating the Projects into new Hadoop platform and experience in Hadoop cluster migration activities.
  • Preparing Oozie workflows, autosys jobs, Korn Shell jobs and pushing the code to DEV, QA, PROD environments.
  • Involved in setting up continuous integration of applications code to higher level environments using SVN code repository
  • Involved in creating Python framework to integrate with application code to schedule, run, loading into Datalake in efficient manner to provide data feeds to all applications development.
  • Involved in Dataset/Data frames/RDDs creation and transferring the Warehouse data into HDFS file system.
  • Performing various POCs as per the business requirements.
  • POCs on AKKA,SPRAY to user as micro services for Hadoop
  • POCs on R, Python to create Data analytics reports on Supply Chain Data.
  • POCs on Ruby Cucumber, RubyMain to implement testing process in environment
  • Implementing Performance Optimization techniques
  • Providing feasibility reports of various file formats, compression techniques.
  • Involving in the technical architecture and design review assignments for a quality solution implementation of new target Data Warehouse in HDFS.
  • Developed solutions to pre-process large sets of structured, semi-structured data, with different formats (Text Files, Avro, parquet, Sequence Files, JSON Records).
  • Logical and Physical dimensional modeling for Hadoop using Data schema methodologies to have data warehouse tables structure created so that Hadoop can be implemented similar way.
  • Involved in transforming, Data cleansing, analyses of data coming from sources using Spark,Scala, Python, Hive, Hbase,Pig frameworks
  • Involved in support activities of the projects in Production and troubleshoot the production issues instantly.
  • Analyzed data from a variety of data sources to present a cohesive view of the data and completed data profiling on various source systems to understand the data.
  • Involved in incremental updating of latest data for current cycles from warehouse system to HDFS file system.
  • Involving in data analysis and architect activities for creating conceptual, logical and physical Data model prior to development.
  • Design the environment to facilitate the core components such as Source data push / pull,Transformations on data, Loading data into Hive warehouse and archive source data for back-up and recovery.
  • Performed deep assessment of the system in order to find whether ETL best practices were followed or not and submitted the recommendations of improvement.
  • Solid experience in handling offshore team, assigning tasks, client interaction, business/functional team meetings.
  • Extensive Experience in writing SQL queries, stored procedures, functions, packages and database triggers, exception handlers.

Confidential

Senior Hadoop Engineer/Big Data Architect

Environment: CDH5.3.2, Hadoop 0.20/2.2,MapReduce,Hive, Apache Pig, SQOOP, Oozie, Java, Python, Spark, Scala, Impala, YARN, HBase, Unix, Oracle, Teradata V13.0, Data stage v8.5/9.1, Control M.

Responsibilities:

  • Understanding the business requirements, current technology challenges & issues.
  • Preparing Design Documents (Request-Response Mapping Documents, Hive Mapping Documents).
  • Implementing the design using Hive, Spark scripts using pyspark, UDFs, Sqoop scripts.
  • Preparing Oozie workflow, Korn Shell jobs and pushing the code to DEV, PROD environments.
  • Involved in Dataset creation and transferring the Warehouse data into HDFS file system.
  • Performing various POCs as per the business requirements.
  • Project Automation using Oozie & Shell jobs
  • Implementing Performance Optimization techniques
  • Providing feasibility reports of various file formats, compression techniques.
  • Involving in the technical architecture and design review assignments for a quality solution implementation of new target Data Warehouse in HDFS.
  • Developed solutions to pre-process large sets of structured, semi-structured data, with different formats (Text Files, Avro, Sequence Files, JSON Records).
  • Logical and Physical dimensional modeling for Hadoop using Data schema methodologies to have data warehouse tables structure created so that Hadoop can be implemented similar way.
  • Involved in transforming, Data cleansing, analyses of data coming from sources using Spark, Python, Hive,Pig frameworks
  • Analyzed data from a variety of data sources to present a cohesive view of the data and completed data profiling on various source systems to understand the data.
  • Involved in incremental updating of latest data for current cycles from warehouse system to HDFS file system.
  • Involving in data analysis and architect activities for creating conceptual, logical and physical Data model prior to development.
  • Design the environment to facilitate the core components such as Source data push / pull,Transformations on data, Loading data into Hive warehouse and archive source data for back-up and recovery.
  • Performed deep assessment of the system in order to find whether ETL best practices were followed or not and submitted the recommendations of improvement.
  • Extensive Experience in writing SQL queries, stored procedures, functions, packages and database triggers, exception handlers.

Confidential

Senior Hadoop Developer

Environment: CDH5.3.2, Hadoop 0.20/2.2,MapReduce,Hive, Apache Pig, SQOOP, Oozie, Java, Python, YARN, HBase, Unix, Oracle, Teradata V13.0, Data stage v8.5/9.1, Control M.

Responsibilities:

  • Prepared the Design Documents (Sequence Diagrams, External Design Report & Internal Design Report).
  • Developed solutions to pre-process large sets of structured, semi-structured data, with different formats (Text Files, Avro, Sequence Files, JSON Records)
  • Worked on POCs and suggested solutions to fix performance & storage issues using different Compression Techniques, Custom Combiners, Custom Partitioners.
  • Developed Pig & Hive/Impala scripts to parse the raw data, populate staging tables and store the refined data in partitioned tables in the Hive.
  • Developed utility classes in Hive using UDFs and written scripts to automatically Sqoop several hundreds of tables from Hive to MySQL.
  • Fixed various data transformation issues in Pig, Hive/Impala & Sqoop.
  • Hands on experience with NoSQL databases like HBase for POC (proof of concept) in storing URL's and images - semi structured and unstructured data coming from UNIX and a variety of portfolios.
  • Experience in using Testing Frameworks of BigData world, MRUnit, PIGUnit for testing raw data and executed performance scripts.
  • Assisted with data capacity planning and node forecasting.
  • Analyzed Migration plans for various versions of Cloudera Distribution of Apache Hadoop to draw the impact analysis, and proposed the mitigation plan for the same.
  • Experience in analyzing the latest versions, solutions, and software in the Hadoop EcoSystem and proposing them to the stakeholders with the proven solutions for the better architecture of the project

Confidential

Senior Software Engineer

Environment: Informatica Power Center 9.6/9.1/8.6, UNIX, Shell scripting, control M, Erwin, Teradata, Oracle 10g/11g, PL/SQL, MySQL.

Responsibilities:

  • Involved in requirement gathering, design phase meetings for Business Analysis and Requirements gathering as per the business needs.
  • Preparing HLDs and LLDs based on TRDs provided by Business System Analyst.
  • Responsible for the overall design of the data/information architecture, which maps to the enterprise architecture and balances the need for access against security and performance requirements.
  • Initiated Design of ETL architecture solution assessments to design data quality rules and reference data.
  • Involved Data Design & ETL efforts to build data warehouse including writing design specifications and finalize DWH ETL Architecture.
  • Data modeling for DWH by following the Star Schema methodology of building the data warehouse.
  • Identified data quality issues by analyzing data in various systems and working with business users in order to implement data cleansing processes.
  • Managed program/project roadmap and key milestone regarding criticality, downstream impact if dates are missed and determine alternative/mitigating actions.
  • Involved in developing Unix scripts for validating source file, creating transformation and load jobs for 4 modules(On going Advice, Advice Details, Case Details, Advice fee payment)
  • Develop cross-functional project plans and manages execution of tasks and completion of all deliverables within the business process and technical areas.
  • Tuned SQL queries using EXPLAIN PLAN, Hints and SQL*Plus Auto trace report for query optimization.
  • Extensive Experience in writing SQL queries, stored procedures, functions, packages and database triggers, exception handlers, Cursors, PL/SQL records & tables.
  • Worked with different information systems such as Oracle,Teradata, Mainframes.
  • Preparing Deployment and Testing plans for SIT, UAT and Production Migration.
  • Analyzed defects raised, provided technical support, Production bug fixings and further enhancements.
  • Raising the Change Requests in GSD R12 to release the job changes to Production and change impact analysis.
  • Working and assigning the tasks to juniors in all modules and helping them in all areas whenever they need assistance.
  • Prepare, maintain and publish ETL documentation, including source-to-target Jobs and business-driven transformation rules.

Confidential

Software Engineer

Environment: Informatica Power center8.6, IBM Info sphere Information Server Datastage v7.5/8.5, UNIX, Shell scripting, control M, Erwin, Teradata, Oracle 10g/11g, PL/SQL, MySQL,SQL*plus, SQL* Loader.

Responsibilities:

  • Migrating Informatica flows to IBM DataStage
  • Created migration documents
  • Reviews existing data dictionaries and source to target mappings
  • Participate in Data modeling discussions, exchange thoughts to come up with a best model to efficiently serve both ETL teams and reporting teams
  • Designed complex DataStage mappings and re-usable transformations to facilitate Initial, Incremental and CDC data loads and parameterize to source from multiple systems
  • ETL migration and administration
  • Extensive Experience in writing SQL, PL/SQL queries, stored procedures, functions, packages and database triggers, exception handlers.

Confidential

Software Engineer

Environment: Informatica Power Center 7.1/8.6, MySQL, Oracle, Toad, UNIX Shell Scripting, Flat Files, Mercury Quality Center, Control M.

Responsibilities:

  • Involved in requirement gathering, analysis, design, development and implementation into the system.
  • Extensive Experience in writing SQL, PL/SQL queries, stored procedures, functions, packages and database triggers, exception handlers.
  • Created tables and indexes
  • Created data dictionaries for source and target tables
  • Interact with business users / Business Analysts to gather requirements and understand the timelines
  • Review the high level Business Requirement Document (BRD) and prepare design documents & source to target mapping (STM) documents
  • Experience in Oracle Business Intelligence Enterprise Edition (OBIEE)
  • Involved in developing both data processing PL/SQL modules and analytical SQL queries for reports
  • Review, Tune Informatica mappings and sessions by following several techniques like session partitioning, incremental aggregation, persistent lookup caches and pre-sorting for aggregation
  • Involved in creating production release documents and migrating code
  • Prepared ETL Documentation and Implementation (Built Code packages and raised necessary request to release code to live

We'd love your feedback!