We provide IT Staff Augmentation Services!

Sr. Big Data Consultant Resume

2.00/5 (Submit Your Rating)

Atlanta, UsA

PROFESSIONAL SUMMARY:

  • Overall 11+ year of IT experience in Data warehousing technologies and around 3 yrs experience in Bigdata Technologies along wif AWS cloud services.
  • Expertise in large scale Big Data Architecture planning and design, application development, deployment and migration of traditional Data Warehouse solutions to Hadoop based Integrated Data Lakes and Enterprise Data Hub (EDH) through acquisition of data across the enterprise.
  • Experience in Hadoop ecosystem technologies like Sqoop, Hive, Spark, Oozie, Zookeeper and Kafka for incremental and real - time data ingestion from varied sources.
  • Work closely wif the client and broader Architecture, Platform, Delivery team to implement in Agile fashion the architecture and chosen AWS services using AWS Best Practices and principles from the AWS Well-Architected Framework.
  • Possess in-depth working knowledge and hands-on development experience in AWS Kinesis data stream, Kinesis firehose, S3, Redshift, Glue, Athena and EMR.
  • Real-time ingestion and processing wif Kinesis Streams, Kinesis Firehose, and Kinesis Analytics.
  • Good knowledge of Hadoop ecosystem such as HDFS, YARN and MapReduce and SPARK framework.
  • Experience in working wif relational databases like DB2, Oracle and NoSQL databases like HBase, MongoDB.
  • Expertise in writing Spark, SparkSQL scripts using Pyspark.
  • Experience writing Hive QL queries for data analysis.
  • Good understanding of Star and Snowflake schema and all datawarehousing concepts
  • Implemented Slowly Changing Dimension Type 1, Type 2 and Type 3 for inserting and updating Target tables for maintaining the history. Have implemented the CDC concept in datawarehousing for incremental data load.
  • Involved in various phases of the software development life cycle right from Requirements gathering, Analysis, Design, Development, and Testing to Production.
  • Performed production support activities in Data Warehouse (Informatica) including monitoring and resolving production issues, pursue information, bug-fixes and supporting end users.
  • Authorized to work in United States for any employer

TECHNICAL SKILLS:

Big Data Ecosystems: Cloudera Hadoop 4.3, Hortonworks (HDP 2.5), HDFS, Map Reduce, YARN, Hive, SQOOP, SPARK (SPARK SQL, SPARK STREAMING), KAFKA, NiFi

AWS Services: Kinesis, Kinesis Firehose, S3, EC2, Redshift, Redshift Spectrum, EMR, VPC, IAM, Glue, Athena, Quicksight, Data Pipeline, DynamoDB, etc..

Programming Languages: Python, PL/SQL

Querying Languages: SQL, HIVEQL

File Systems: HDFS, LINUX, XML, AVRO, JSON, PARQUET

Databases: Hive, DB2, ORACLE, NoSQL (HBase, MongoDB)

Schedulers: Oozie, Autosys, DAC

Service Delivery Mgmt: ServiceNow, Remedy, Jira

ETL Tools: Informatica Power Center 10.X/9.X/8.X

Operating Systems: Windows 2012 R2/2008/2007/2005/NT/XP, UNIX/Linux

PROFESSIONAL EXPERIENCE

Confidential, Atlanta, USA

Sr. Big data consultant

Responsibilities:

  • Design and develop scalable data processing pipelines and application using AWS big data technologies like Glue ETL using PySpark, Hive, EMR (Sqoop), AWS data and analytics services such as Redshift, Athena, etc. and data hub on S3.
  • Daily activities will involve development, testing, meeting wif technology leadership to work throughtechnical designs, supporting other developers on the team wif their deliverables, and participating inAgile (SCRUM) ceremonies
  • Participate in problem identification and requirement gathering by working closely wif product owners or analysts to understand business and functional requirements
  • Provide operational and production support for the applications we build and maintain.
  • Work wif the Data Science team to profile large data sets, and define and implement analytical models
  • Develop and refine design patterns, processes, standards, and components for various data engineering

Confidential, North Andover, USA

Sr Data Engineer

Responsibilities:

  • Designing and developing real-time data ingestion into the Spark environment on HDP 2.5 cluster using Kafka message broker.
  • Configuring various topics on the Kafka server to handle transactions flowing from multiple ERP systems.
  • Developed SQOOP jobs to collect master data from Oracle tables to be stored on Hive tables using Parquet file format.
  • Used Shell Scripting in Linux to configure the Sqoop and Hive tasks required for the data pipeline flow
  • Developed data transformation modules in Python language to convert the JSON format files into Spark DataFrames to handle data from Legacy ERP systems.
  • Developing scripts using Spark Streaming API and PySpark for data transformation using Spark Dataframes.
  • Used HiveContext on Spark to store the data in Hive tables on HDFS for adhoc data analysis and reporting.
  • Implemented SQL type joins between Spark dataframes and worked wif Spark RDDs to merge the transactional and master files for preparing the final data loads onto MongoDB.
  • Designed and developed various KPI driven data marts on MongoDB and loaded data from HDFS to MongoDB.
  • Monitor user queries on data and running adhoc queries on MongoDB to answer the data related questions.
  • Performed advanced procedures like text analytics using Python language to generate buyer recommendations for Watts products.
  • Involving in end-to-end development lifecycle from requirement gathering till UAT and deployment.

Environment: HDP 2.5, HDFS, Kafka 0.10.0, Spark 1.6 (Spark Streaming, Spark SQL), Hive 1.2.1, Sqoop 1.4.6, Scala, Python, Shell Scripting, MongoDB 3.4

Confidential, Atlanta, USA

Sr BI Analyst

Responsibilities:

  • Designing and implementing end to end data near real-time data pipeline by transferring data from DB2 tables into Hive on HDFS using Sqoop.
  • Accepting and processing the data on border crossing feeds in various formats like CSV, Flat file and XML
  • Experience in handling VSAM files in mainframe and transforming to a different Code Page before moving them to HDFS using SFTP.
  • Designing and Developing Scala scripts to perform data transformation and aggregation through RDDs on Spark.
  • Designing and implementing the data lake on the NoSQL database HBase wif denormalized tables suited to feed the down- stream reporting applications.
  • Developing Hive Functions and Queries for massaging data before loading the Hive Tables.
  • Involved in Hive performance tuning by changing the Join strategies and by implementing Indexing, Partitioning and bucketing on the transactional data.
  • Extensively worked wif Avro and Parquet file formats while storing data on HDFS to be accessed through Hive.
  • Extensively used Oozie workflow scheduler to automate the Hadoop jobs by creating Direct Acyclic Graph (DAG) of actions wif necessary flow controls while managing dependencies.
  • Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.

Environment: CDH4.3, HDFS, Hive, Sqoop, Scala, Spark Core, DB2, HBase

Confidential, Atlanta, USA

Sr BI Analyst

Responsibilities:

  • Involved in gathering of business requirements, interacting wif business users and translation of the requirements to ETL High level and Low-level Design.
  • Documented both High level and Low-level design documents, Involved in the ETL design and development of Data Model.
  • Worked in importing and cleansing of data from various sources like Oracle, SQL server, flat files, SQL Server 2012, DB2 wif high volume data.
  • Developed complex ETL mappings and worked on the transformations like Source qualifier, Joiner, Expression, Sorter, Aggregator, Sequence generator, Normalizer, Connected Lookup, Unconnected Lookup, Update Strategy and Stored Procedure transformation.
  • Implemented Slowly Changing Dimension Type 1 and Type 2 for inserting and updating Target tables for maintaining the history.
  • Worked on loading the data from different sources like Oracle, SQL server, DB2, Flat files (Created Copy book layouts for the source files), ASCII delimited flat files to Oracle targets and flat files.
  • Worked wif Mapping variables, Mapping parameters, Workflow variables, implementing SQL scripts and Shell scripts in Post-Session, Pre-Session commands in sessions.
  • Integration of Saleforce Rest API wif Informatica cloud services.
  • Built the Data lake on Amazon S3 and Redshift for the Analytics.
  • Wrote SQL*Loader scripts for preparing the test data in Development, TEST environment and while fixing production bugs.
  • Used the debugger to identify the processing bottlenecks, and performance tuning of Informatica to increase the performance of the workflows.
  • Created ETL deployment groups and ETL Packages for promoting up to higher environments.
  • Involved in various phases of the software development life cycle right from Requirements gathering, Analysis, Design, Development, and Testing to Production.
  • Performed and documented the unit testing for validation of the mappings against the mapping specifications documents.
  • Performed production support activities in Data Warehouse (Informatica) including monitoring and resolving production issues, pursue information, bug-fixes and supporting end users.

Confidential

Sr. Developer - Informatica & Business Objects

Responsibilities:

  • Understanding the requirements through BRD (Business Requirement Document).
  • Interacting wif BA’s to understand the functional requirements.
  • Created the data warehouse from multiple source systems like CDD, COLT, SCOT, CASS II, CDP etc.
  • Analyzing the requirement & designing the Informatica mappings, sessions and workflows.
  • Created more TEMPthan 15 dimensions and 6 fact tables wif start schema.
  • Created couple of reusable mapplets and worklets.
  • Designed Universe from scratch, migrating to different environments, designing WebI reports, scheduling of reports and maintaining them on Business Objects.
  • Unit testing of Universe and Reports through data (request) creation on application and matching it wif Report data for specific request id’s.
  • Interacting wif QA team for testing.
  • Ensured smooth implementation of the requirements by having regular discussions wif the team members and Users
  • Worked on creation of Universe Designer and Web Intelligence Reporting for all the project applications.
  • Unit, Integration and Performance Testing.
  • Designed and Developed new Business Object universe which TEMPhas Sybase database as data source.
  • Root Cause Analysis for the issues raised in production and suggesting solutions and actual participation in resolving Production issues.
  • Production support for each Production deliverables, validations and continuous follow ups wif users to confirm the proper working of code in Production Environment.
  • Involved in major change requests
  • Enhancing Business Objects Universes depending on requirements
  • Designed Reports based on universe using Business Objects XI 3.1
  • Created deliverable Design Document for Universe and Reports.
  • Developed reports using multiple data providers, @prompts, Variables and Sections
  • Resolved Loops in Universe by creating Aliases.

Confidential

ETL Designer and Developer

Responsibilities:

  • Understanding requirements from the BRD and proposing designs as per requirements.
  • Preparing Effort Estimation as per the requirement. Preparing Analysis & Design document and Unit Test Plans according to business logic.
  • Understanding & creating source and target definitions, Mappings, Transformations and working wif Sessions. Performing Code and Document Reviews.
  • Performing Unit testing and preparing deliverables.
  • Involved in Code Deployment in another environment.
  • SIT support UAT support Post Production Support

Confidential

Informatica Developer

Responsibilities:

  • Incorporated Filters, expression, update strategy transformation etc across mappings.
  • Worked wif models which were having one-time load and as well as incremental load based on HIGH WATER MARK table.
  • Developed sessions and workflows and implemented SET, CLEAR, LOCK WAIT and LOCK RELEASE concept over the sessions.
  • Wif the halp of DA scripts loaded the base tables by putting a post load session command wif the file name.
  • Responsible for data validation and to check whether the source data is loaded properly to the target tables.

Confidential

Informatica developer

Responsibilities:

  • The project is approached as SCD TYPE 2 model, SOURCE to AUDIT using Informatica, then Surrogate key generation before moving to AUDIT to LRF and Finally to Nexus Base Data Allegro and Terradata.
  • I have worked in DELL data source like Dellserv in this project across 3 different regions AMER, EMEA and APJ.
  • Incorporated Filters, expression, update strategy, transformation etc across mappings.
  • Worked wif models which is having one-time load and as well as incremental load based on HIGH WATER MARK table.
  • Developed sessions and workflows and implemented LOCK and WAIT concept over the sessions.
  • Wif the halp of DA scripts loaded the base tables by putting a post load session command wif the file name.
  • Responsible for data validation and to check whether the source data is loaded properly to the target tables.

Confidential

ETL Developer

Responsibilities:

  • Involved in Design Specifications and technical Sessions.
  • Understood the Business point of view to implement coding using Informatica power center designer.
  • Experience wif high volume datasets from various sources like Oracle tables, Text Files, XML, and Relational Tables.
  • Interfaces are built and automated wif Informatica ETL tool, PL/SQL and Unix Shell Scripting.
  • Experienced using Informatica integrated wif web services
  • All the jobs are integrated using complex Mappings including Mapplets and Workflows using Informatica power center designer and workflow manager.
  • Performance tuning TEMPhas been done to increase the through put for both mapping and session level and SQL Queries Optimization as well.
  • Automated the jobs thru scheduling using Built in informatica scheduler, which runs every day by maintaining the data validations
  • Performed Informatica code migrations, test, debug, document to maintain programs and deployed.
  • Experienced in Analyzing Source data for Metadata multiple Repositories.
  • Provide solutions to the End Users to the multiple applications.
  • Synchronization TEMPhas been done between Informatica Metadata and data modeling diagrams
  • Provided support and quality validation thru test cases for all stages of implementation
  • Post Production support is done to resolve Issues.

We'd love your feedback!