Big Data Engineer Resume
Sacramento, CaliforniA
SUMMARY:
- Overall 9 years of experience in the IT industry with a strong background in Software Development solutions in data warehousing.
- Excellent understanding on full software development life cycle (SDLC) of ETL process including requirement analysis, design, development, support of testing and migration to production.
- Hands - on experience in execution, optimization and troubleshooting SQL queries - T-SQL, HIVE SQL
- Experience in performing POC using AWS and Azure for data warehousing.
- Extensively worked with large Databases in Development and Production environments.
- Experience with advanced ETL techniques including Data validation and Change data Capture (CDC).
- Data Extraction, Data Profiling, Data Normalization, Data Exploration, Data Aggregation and Data Analysis using modern tools like Hive, SQL and Advanced MS Excel
- Experienced in Implementing Big Data Technologies - Hadoop ecosystem/HDFS/ Map-Reduce Framework, Sqoop, Oozie and HIVE data warehousing tool.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode, Resource Manager, Node manager and MapReduce concepts.
- Experience in working with Code migration, Data migration and Extraction/Transformation/ Loading with Teradata and HDFS files on UNIX and Windows.
- Experience with performing POC for data warehousing using AWS tools in Cloud.
- Experience with code migration from between repositories and folders using subversion Tortoise SVN tool and Tidal.
- Hands on experience in identifying and resolving performance bottlenecks in various levels like sources, targets, mappings and sessions.
- Good experience working with clients and business users to build project plan and co-ordinate with offshore to work on the project.
- Extensively worked on identifying bottlenecks and performance tuning Hive and SQL scripts.
- Basic understanding on JCL, COBOL, CA7 and VSAM files.
- Good experience creating dashboard using Tableau.
- Expertise in developing Teradata SQL Scripts through various procedures and functions to implement the business logics.
- Worked with RDBMS Teradata and utilities like Fastload, Multiload, Tpump and Fastexport.
- Good experience working on GitHub.
- Analyzed and resolved the incidents raised by the application users on priority (low, medium and high) through Enterprise support tickets and Jira request.
- Versatile team player with excellent analytical, presentation and interpersonal skills with an affinity and aptitude to learn new technologies.
TECHNICAL SKILLS:
Big Data Ecosystems: Hadoop, Map Reduce, HDFS, Hive, Pig, Sqoop, Impala, Spark, Kafka, AWS, Azure
Operating Systems: Windows, UNIX, LINUX
Programming Languages: SQL, HQL, Python, Shell scripting
ETL: SyncSort DMeXpress, HIVE, AWS Glue, Databricks, SSIS
Schedulers: Autosys, Oozie, Tidal, AWS Glue,SSIS
Other Tools: My SQL Assistant, SSH Tectia, Super Putty, Tableau, Visual Studio Code, Github, Tortoise SVN, Nexus (NWR), Enterprise Incident Management(IM), Jira, SSMS, SSAS, SSRS
Scripting Languages: UNIX Shell Scripting, Pig Latin, T-SQL
Databases: Teradata, SQL Server
PROFESSIONAL EXPERIENCE:
Confidential, Sacramento, California
Big Data Engineer
Methodology - Agile
Responsibilities:
- Responsible for analyzing DSH’s existing traditional data warehouse which is using SQL Server and propose a suitable cloud solution to migrate and perform ETL.
- Analyze and compare various tools of AWS and Azure to set up ETL data warehousing process and perform POC.
- Performed brainstorming on Snowflake for ETL purpose.
- Responsible for performing POC using AWS tools like Data Migration Services, S3, AWS Glue (ETL), AWS Athena and Tableau.
- Responsible for performing POC using Azure tools like Data Factory, Blob Storage Gen1, SQL Data warehouse, Databricks (ETL) and Tableau.
- Conduct s on Big Data tools, languages and modernizing traditional analytics to DSH staffs.
Environment: Microsoft SQL Server Management Studio 18, Python, AWS, Super Putty, GitHub, Visual Studio Code, Tableau
Confidential
Senior Big Data Engineer and Onshore Lead
Methodology - Agile
Responsibilities:
- Responsible for managing and developing end to end process of preparing the data feed for all the automated Email Marketing Campaigns.
- Responsible for closely working with Adobe Campaign team and Campaign owner to gather requirement, analyzing the source data coming from different sources and to design the automated process to prepare the data feed for the trigger campaigns.
- Good experience data analysis for research email marketing campaigns.
- Responsible for Extract, Transform and Load marketing data using HIVE SQL to perform in-depth analysis of different marketing & research campaigns and predict customer trends to help product managers revamp the marketing strategy.
- Build custom HIVE tables and create logic using HIVE SQL based on the requirements of individual product owners to facilitate ad-hoc reporting.
- Prepare feeds as per the requirements using Hive for email marketing campaigns.
- Translated requirements into business needs and involved in performing testing to validate the functionalities with adherence to the Hadoop security policies.
- Responsible for managing and coordinating with the offshore team to build the campaigns and schedule regular meetings with offshore and client partners.
- Responsible for defining resources required, assigning tasks, setting goals, tracking and monitoring the progress of tasks for completion.
- Experience in creating complex views for end users to be used for email marketing campaigns.
- Schedule the workflows in Oozie via Tidal and monitor them.
- Worked on fine tuning the process for performance to ensure that the job completes on time, so the daily campaigns are not affected.
- Worked in identifying and resolving performance bottlenecks by validating the data at various levels like sources, targets, mappings and sessions.
- Responsible for preparing Data Model for Adobe Campaign team.
- Responsible for coordinating with the offshore team to build the campaigns.
- Responsible for performing QA/Unit Testing to validate functionality.
- Responsible for getting required approvals from the concerned team to deploy the code.
- Monitor the job and validate the data post production.
- Created monthly/daily reports and dashboards for clients/business partners as part of the metrics for senior management review using Tableau.
- Worked on identifying bottlenecks and performance tuning Hive scripts.
- Automated the process of data and latency in source validation by creating Data Governance report to ensure the feeds are sent accurately on as scheduled.
- Worked adhoc on code changes to include new products and requirements to improve the email marketing campaigns and also the requests via the Jira tickets.
- Responsible for Client and offshore coordination for the project including status meetings
- Worked on Github to maintain the codes for our campaigns.
Environment: Hadoop Ecosystem, HIVE, UNIX, Flat files, XML, Shell Scripting, Super Putty, GitHub, Visual Studio Code, Oozie, Tidal, Tableau, Enterprise Incident Management (IM)
Confidential
Hadoop Developer and Onshore Lead
Responsibilities:
- Analyzing the source data coming from different sources and working with business users and developers to design the DW Model.
- Translated requirements into business rules & made recommendations for innovative IT solution.
- Implemented the DW tables in a flexible way to cater the future business needs.
- Working closely with user decision makers to develop the transformation logic to be used in DMeXpress tool.
- Responsible for working with offshore team, assigning tasks and scheduling status meetings to implement the project.
- Complete initial data profiling and matching/removing duplicate data will be performed.
- Configured profiled and applied out of box data quality rules provided by the product and helped them in understanding the process along with data.
- Installed and configured content-based data dictionaries for data cleansing parsing and standardization process to improve completeness conformity and consistency issues identified in the profiling phase.
- Extensively worked on HQL for Transformations such as Aggregate, Copy, Join, Merge, Filter, Reformat, Partition, Map reduce etc.
- Extensively Used Environment SQL commands in workflows prior to extracting the data in the ETL tool.
- Removed bottlenecks at source level, transformation level, and target level for the optimum usage of sources, transformations and target loads.
- Captured data error records corrected and loaded into target system.
- Interfacing with and supporting QA/UAT groups to validate functionality.
- Used Autosys, Oozie as a scheduling tools in the project.
- Scheduling the Workflows and monitoring them. Provided Pro-Active Production Support after go-live.
- Extracted data from Legacy systems and placed in HDFS and processed.
- Responsible for ensuring that the code complies the Hadoop security standards.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Experienced in managing and reviewing Hadoop log files.
- Load and transform large sets of structured, semi structured and unstructured data.
- Involved in loading data from UNIX file system to HDFS.
- Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map reduce way.
- Used Oozie as an automation tool for running the jobs.
- Client and offshore coordination for the project including status meetings.
- Worked with business users to create reports and engaged them for testing.
- Good experience working on Hue and Impala for executing adhoc queries.
- Experience creating workflows using Visio and ARIS enterprise modelling tool.
Environment: Hadoop Ecosystem, DMeXpress tool(ETL), UNIX, Teradata, Flat files, XML, Shell Scripting, Super Putty, SVN and my SQL assistant, Oozie, Autosys, Nexus(NWR), Enterprise Incident Management(IM)
Confidential
Hadoop Developer
Methodology - Waterfall
Responsibilities:
- Analyzed various applications and identified functional modifications required as part of this migration.
- Effort estimates and impact analysis for the new change requests.
- Provide technical vision and recommend strategies and solutions for projects.
- Tracking and mitigating the risks.
- Prepared High level and Low-level design documents.
- Worked on environment setup for coding as part of the migration.
- Participated in the technical reviews with respective to migration, operations and working towards its completion.
- Performed performance testing and analysis along with the performance metrics.
- Used Autosys to schedule jobs in the project.
- Worked on preparing the test cases, documenting and performing unit testing and System Integration testing and fixing defects.
- Client and offshore coordination for the project including status meetings.
- Responsible for task planning, code inspection, system performance reviews.
- Coordinate with the end users and help them in validating the data and closing the defects with severity 1,2,3
- Effectively managed the test environment for the smooth execution of parallel projects.
- Perform peer review and code walkthrough.
- Assist in deployment and provide technical and operation support during install.
- Involved in post-implementation support.
- Responsible for ensuring quality deliverables within the stipulated timelines.
Environment: Teradata RDBMS 14.10/15.10 , Sqoop, Hive, Impala, MySQL assistant, Super Putty, SVN, DMeXpress, Autosys, Nexus(NWR)
Confidential
Datawarehouse Analyst
Responsibilities:
- Responsible for building policies and ensure the platform users are complying with the Teradata policies.
- Prepare and execute automated compliance reports to find defaulters and educate them on the policies.
- Work with application teams to educate them on Teradata policies and remediate the issues.
- Prepare monthly metrics on each Teradata platforms and ensure the platform policy compliance remains green.
- Review post week’s Health and status of the Production Platforms: Production(VA8), RHUB(VA20), ADP(TX16), AHP(VA9) & HTX(TX18) - CPU usage, AWT usage, query response times with query volume, system trend report, system resource usage, CPU usage, Utility usage, Archival usage, Utility exhaustion, Top CPU users.
- Finding average query response times in different boxes like VA8, VA20 and TX16.
- Provide query optimization and tuning support to End Users.
- Developed automated tool to capture security incompliant users and send emails using JCL in Mainframes.
- Experience in generating reports using PDCR, DBQL and DBC tables for senior management at the bank. These reports are for various requirements like assessing CPU/IO/space usage for different applications.
- Impact CPU and Heavy Hitter identification and suggesting the load teams for tuning bad performed queries.
- Performance tuning on complex SQL queries with efficiency of PI/SI indexes, Join Index, PPI, Using Explain analyzing the data distribution among AMPs and index usage, collect statistics, definition of indexes, revision of correlated sub queries, etc. that adversely affect the Teradata platform.
- Experience in generating performance metrics that portrays the overall system health on daily, weekly and monthly basis for senior management at the bank.
- Co-ordinate with the ETL Load team and other development teams for SLA compliance/reporting.
- Build SharePoint sites for easy data management.
Environment: Teradata RDBMS 12.0/13.10/14.10, BTEQ, FASTLOAD, MLOAD, TPUMP, Teradata SQL Assistant, Nexus(NWR), Enterprise Incident Management(IM), Mainframe (IBM z/OS), JCL