Senior Engineer Environment Resume
U S, A
SUMMARY:
- 10+ Years of Experience in Information Technology in developing large - scale Data warehouse, Data Lake, BigData and Analytics solutions. Specialized in BigData architecture and large scale distributed data processing for BigData Analytics using Hadoop ecosystem.
- Experienced in all aspects of software development life cycle including analysis, design, development, modelling, testing, debugging, deployment, operational support of significant Data Integration, Analytics, and Machine Learning projects (Big Data, DW, BI) in Waterfall as well as SCRUM/Agile methodologies.
- Experienced in development of a framework for common and consistent use of the intended design for batch data ingestion, data processing and delivery of massive datasets.
- Good Exposure and understanding in Finance, Retail & Insurance domain.
- Expertise in Hadoop Ecosystems HDFS, Pig, HBase, Hive for analysis, TDCH for data migration, Sqoop for data ingestion, Oozie for scheduling.
- Good experience working with Hadoop distributions in Hortonworks.
- Involved in designing Hive schemas, using performance tuning techniques like partitioning, bucketing.
- Optimized HiveQL/Pig scripts by using execution engine like Tez.
- Experience in writing Adhoc queries moving data from HDFS to HIVE & analyzing the data using HIVEQL
- Experience in building Pig scripts to extract, transform and load different file formats- TXT, XML, JASON data onto HDFS, HBase and Hive for data processing.
- Experience in Ingesting data to different Tenants in Data Lake and creating snapshots tables for Consumption. Experience in working on HBASE tables with good understanding on Filters.
- Worked Sqoop jobs with incremental load to populate Hive External tables.
- Very good understanding of Partitions, bucketing concepts in Hive. Designed and Managed External tables in Hive to optimize performance.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems.
- Used HBase in accordance with PIG/Hive as and when required for real time low latency queries.
- Excellent knowledge & experience in Data Warehouse Development life cycle, Dimensional modelling, Repository & administration, implementation of Star and Snow Flake Schemas.
- Extensive experience in Extraction, Transformation, Loading (ETL) data from various sources into Data Warehouse and Data Mart using DataStage 7.5/8.5.
- Experience in integration of various data sources like Oracle, DB2 UDB, Teradata, XML & Flat files into the Staging Area.
- Handling huge volume of data using Teradata utilities like BTEQ, Fast load, Multi Load & TPUMP.
- Reviewed SQL to ensure efficient plans and performance CPU thresholds. Worked closely with Micro Strategy Architects to adjust SQL as necessary.
- Experience in Project Management. Planning, Task identification, Tracking, resource & reporting.
- Created & deployed Work orders & resolved problem request tickets on HPSM /Remedy tool and ServiceNow to track the incidents. Worked on Version Control tools like MSTFS, GITHub and Jenkins.
- Mentors and assists lower level architects, business analysts and developers. Technical architect responsible to come out with the design patterns to load the data in DWH (ETL & ELT).
- Lead a team of 5 to 10 resources & Developed and Guided Test Strategies and Plans for Unit, Integration, System and UAT.
- Expert in designing ETL Architecture and star schema and snow flake dimensional models.
- Extensively worked and implemented the Parallelism and Partitioning techniques in DWH application.
- Partners with business line managers and Centre of Excellence leaders, providing technical and system expertise, to synergize IT strategic direction and product concepts.
- Strong design skills in Logical Data modelling, Physical Data modelling using Erwin tool and Metadata Management for OLTP (Relational) and OLAP (Dimensional ) systems. Proficient in Data Normalization and De-normalization techniques with extensive experience.
- Very good experience in customer specification study, requirements gathering, system architectural design and turning the requirements into final product/service.
- Familiar with building end to end solution, onshore-offshore model and multi-site development projects, System Integration, migration, maintenance and Support projects.
- Proven organizational, time management and multi-tasking skills and ability to work independently and quickly learn new technology and adopt to new environment.
TECHNICAL SKILLS:
Hadoop Ecosystem: Hive, HBase, Pig, Kafka, MapReduce, SQOOP, Oozie, Python
ETL Tools: IBM WebSphere DataStage and Quality Stage 8.7
DataStage(Ver7.x, 8.x,11.3)(Manager, Administrator, Designer, Director): Dimensional Data Modelling Data Modelling, Star Schema Modelling, Snow-Flake Modelling, FACT and Dimensions tables, physical and logical data modelling, Erwin 4.1.2/3.x
Operating Systems: Linux, Unix(Solaris, AIX), Windows 95/98/NT/2000/XP,Windows7
Databases: Oracle 10g/9.x/8.x, SQL Server, DB2,Teradata, PostgreSQL
Tools: MS-Access, MS Power Point, Visio, Share Point, Putty, Telnet, WINSCP, Oracle, AQT, Teradata SQL Assistance
Scheduling: Control M, Tivoli, Autosys
Agile Tools: NinJira, Version One
Programming Languages: SQL, UNIX/Linux Shell Scripting
Version Control Tools: MSTFS, GIT Hub
PROFESSIONAL EXPERIENCE:
Confidential, U.S.A.
Senior Engineer Environment
Responsibilities:
- Experienced with requirement gathering, Project planning, architectural solution design and development process in agile environment.
- Prepared a Hadoop framework to be able to frequently bring in data from the source and make it available for consumption.
- Worked on the Sqoop, Hive for Hadoop data landscape. Created Hive scripts to build the foundation tables by joining multiple tables.
- Experience in ingesting data to different tenants in Data Lake.
- Responsible for building scalable distributed data solutions using Hadoop Eco System.
- Facilitated meetings with business and technical teams to understand requirements and translating requirements to technical solutions and implement in Hadoop Platform.
- Review of requirements, developing technical design documentation, validation of test scripts and coordinate the implementation of the same into production and support activities.
- Working experience on designing and implementing complete end-to-end Hadoop Infrastructure including HIVE, Sqoop and HDFS.
- Created partitioned tables using ORC file Format in Hive for best performance and faster querying and handled the huge volume of data.
- Created Sqoop jobs with incremental load to populate Hive tables.
- Experience in optimizing the hive queries to improve the performance of hive queries. Performed extensive data analysis using Hive.
Confidential, USA
Sr BI Engineer Confidential
Environment: Hive, Oozie Co-Ordinator, HBase, Pig, Shell Scripting, Sqoop, Control-M, Service Now, Git Hub, Jenkins, Query Surge, Data Stream, NinJira, Teradata, Python, Kafka
Responsibilities:
- Worked on the Sqoop, Hive, HDFS file system for Hadoop archival from Teradata. Experience in using Sequence files and ORC file formats.
- Created Hive scripts to build the foundation tables by joining multiple tables.
- Change in source from IMN, PRISM to CORONA (Which is a Kafka Feed) ingested into HIVE table which is in JASON format Parsed into HIVE tables using HIVE function.
- Used the TDCH utility for data ingestion from Hive to Teradata.
- Created the Partitioned Tables using ORC File format and handled the huge volume of data from landing to foundation hive tables.
- Developed Oozie workflow & Oozie Co-ordinators for scheduling and orchestrating the ETL Process. Involved in scheduling Oozie workflow engine to run multiple Hive jobs.
- Created the jobs to transfer the data across different system using DataStream for the Initial source data analysis.
- Designed and Developed the Sqoop full and Incremental load scripts to bring the data from different heterogeneous source systems.
- Created Sqoop jobs with incremental load to populate Hive tables.
- Design & Develop the wrapper scripts, Oozie xml’s workflows. Create the Oozie Co-Ordinator for Hadoop code execution.
- Validate the data which has been ingested into Hive tables. Created the Hive Partition Internal tables.
- Prepared an ETL framework with the help of Sqoop, pig and hive to be able to frequently bring in data from the source and make it available for consumption.
- Executable code deployment to multiple Servers using CICD process like code versioning tools GitHub and Jenkins.
- Analyze & Understanding the ETL/ELT Merchandising ITEM application data and the existing ETL/ELT code end to end process.
Confidential, USA
Sr BI Engineer
Environment: Hive, HBase, Oozie Co-Ordinator, Shell Scripting, Sqoop, Query Surge, Service Now, Git Hub, Jenkins, NinJira, Teradata, Python
Responsibilities:
- Analyze & Understanding new VMM services of Business partner and Brand data. The existing ETL code end to end process.
- Extracting the data using SQOOP tool and ingesting into Hive database for Initial data loads.
- Worked on the Sqoop, Hive, HDFS file system for Hadoop archival from Teradata. Experience in using Sequence files, RCFile and ORC file formats.
- Created and worked Sqoop jobs with incremental load to populate Hive External and Managed tables.
- Design & Develop the wrapper scripts, Oozie workflows and Oozie Co-ordinators.
- Created the Partition/Bucketing Table using ORC File format and handled the huge volume of data from landing to foundation hive tables. Optimized hive script to use HDFS by using various compression mechanisms.
- Writing UDF’s depending on the specific requirement. Manage and review Hadooplog files, File system management and monitoring.
- Implementation & Post Implementation Support activities in different Hadoop environments. Design the Oozie Co-Ordinator flow for Business Partner and Brand flow execution. Executable code deployment to multiple Servers using CICD process like code versioning tools GitHub and Jenkins.
- Created the Hive Partition Internal tables, created and used the TDCH utility for data ingestion from Hive to Teradata.
Confidential, USA
Associate Technical Architect
Environment: IBM InfoSphere DataStage (version 8.5), Teradata, UNIX/Linux, Flat Files, Control-M, QC, HPSD tool, Query Surge, Erwin, Jira
Responsibilities:
- Understanding the existing architecture and flow dependency of Merchant Analytics Application.
- Working on fixing the production issues and defects.
- Exploring tuning opportunities in ETL & DB for better response time.
- Developing high level design documents and templates to follow a specific design approach.
- Worked on Teradata Load utilities like (Bteq, Fast Export, Fast Load, Multi Load, and TPT) which was mainly used to Extract the data from Files/Tables and unload into Tables/files.
- Involved in development activities to design and develop DataStage jobs.
- Leading 6-member offshore team with development Execution and Implementation activities.
- Involved in fixes for bugs identified during production runs within the existing functional requirements.