Senior Data Engineer Resume
SUMMARY
- Over 14 + years of professional IT experience spanning across multiple technologies and expertised in Data engineering, Design & Developing of various applications.
- Experienced in Big data, Data warehousing, data modeling & Automation in building up projects involving Data Ingestion, Transformation and processing.
- Experienced in data processing with Hadoop Ecosystem tools like HDFS, Hive, Sqoop, Spark SQL with Scala and PySpark and ETL tools like Informatica Power Center with SQL Server,Oracle, Terdata and Unix.
- Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Effective & efficient Joins and Transformations during ingestion process itself.
- Experienced on Advanced SQL with RDBMS(DB2,Oracle and Terdata) and developing Hive scripts using Hive UDTF,HiveQueryLanguage for data processing and end user analytics.
- Well - versed with importing and exporting data using Sqoop from HDFS to Relational Database Management Systems (RDBMS) and vice-versa.
- Worked on the Automation of scripts using Shell scripting & Python.
- Experienced in working with Amazon Web Service(AWS) using Redshift for querying and S3 as a Storage mechanism.
- Worked on the different storage system HDFS and S3.
- Worked extensively on Agile methodology which involves Iteration planning, Sprint, Retro and backlog planning.
- Worked on continuous integration and continuous delivery/continuous deployment tools like Team City/Ops logical/Bamboo and GITHUB/BITBUCKET.
- Involved in the business/client meetings for design, development, requirements discussion and providing solution for complex scenarios
- Able to assess business rules, collaborate with stakeholders and perform source-to- target data mapping. Prepared LLD and Technical Specification documents.
- Involved in following up the coding and designing standards, guidelines, tools selection etc as mentioned by the client.
- Involved in all phases of Unit Testing, SIT, UAT and Support.
- Proven ability to work independently also as part of a team, leading the team on technical front and played the role of mentoring the team members.
- Onsite Experience in interacting with Business Users/Stakeholders to analyze the business rules and requirements in Banking and Domains.
TECHNICAL SKILLS
Big Data: Data Management Yarn
Data Processing: Hadoop(Hive), Spark (Spark-SQL with Scala and PYspark)
Data Ingestion: Sqoop, Nifi
Workflow scheduler: Tivoli, Autosys, Oozie and Control Center
Hadoop Distribution: Apache, Hortonworks and Cloudera
Visualization: Zeppelin, Hue and Impala
Continuous Integration/Continuous deployment: Team city and Bamboo
Scripting Language’s: Python and Unix Shell Scripting
Programming Language: Python,Scala
Operating System: Linux -Ubuntu,Windows
Cloud computing: Amazon Web services (AWS)
Databases/Database Tools: Oracle Versions 10g/9i SQL Server(Microsoft SQL Server2012), Teradata((Toad for data analytics 3.0)
IDEs, Build Tools: Eclipse/IntelliJ/PyCharm and SBT
ETL Tools: Informatica Power center 9.5.1/9.1/8.6
Emulator and FTP: Putty, WinSCP, FileZila and GITBASH Hosting service (vcs)
PROFESSIONAL EXPERIENCE
Confidential
Senior Data engineer
RESPONSIBILITIES:
- Collaborated with Salesforce Team to understand the Architecture followed.
- Worked with client to understand the business needs and translate those business needs to actionable transformation, saving 17 hours of manual work each week.
- Designed the data pipeline architecture for implementing this new product that was quickly appreciated and endorsed by clients.
- Performed extensive analysis of system to map business requirements with the application and Worked closely with the business and analytics team in gathering the system requirements
- Handled heterogeneous data sources such as RDBMS, Salesforce API, Microsoft Sharpoint and different file formats
- Used Python REST API’s particularly GET and POST requests to create, monitor and fetch jobs to extract the data from Salesforce APIs and Microsoft SharePoint into AWS S3.
- Used AWS Redhift to query large amount of data stored on S3 to create visual datalake.
- Used spark DataFrameWriter to convert the data to data frames.
- Analyzed the SQL scripts and designed the solution to develop a custom ETL pipeline by using Python.
- Once the data is transformed using Hive SQL’s, the transformed business data is loaded.
- Developed shell scripting and Python programs to automate the data flow on day-to-day tasks.
- For CI- CD process, Developed code used PyCharm for IDE’s and Bitbucket for code integration and Bamboo for deployment in various environments.
- Determined feasible solutions and make recommendations in post-production
- Directed less experienced resources and coordinate systems development tasks on small to medium scope efforts or on specific phases of larger projects.
- Participated with other Development, operations and Technology staff, as appropriate, in overall systems and integrated testing on small to medium scope efforts or on specific phases of larger projects.
- Prepared test data and executed the detailed test plans. Completed any required debugging.
Confidential
Squad Lead and Senior Data engineer
RESPONSIBILITIES:
- Working in Client Location acting as Senior data engineer/Squad lead coordinating Joint Application Development (JAD) sessions with Solution Designer/Business Analysts and Business stakeholders for performing data analysis and gathering business requirements.
- Developed technical specifications of the ETL process flow.
- Involved in Design approach, end to end coding,Peer review,unit testing and production deployment.
- Fetching data using Scala pipelines from SAP, HLS, CC and COMSSEE source system and persisting in spark
- Performed end to end architecture and implementation assessment of various AWS services like Amazon EMR,Redshift,S3
- Writing Spark programs in Scala with IntelliJ IDE for the transformation between Source and Target
- Developed Spark SQL code to process the data in Apache spark on Amazon EMR to perform necessary transformation based on the Source to Target mappings.
- Testing of Dev code in Spark shell by creating RDD’s and Data frames.
- Leveraged AWS S3 as storage layer for HDFS
- Implemented Zero data loss using reading/updating offsets in HDFS.
- The system is running on production clusters 24x7 and processing almost around 4 million of data every day.
- Implemented DQ validation and persist invalid records in HDFS location
- Applied validation, business rules and transformations on top of the raw data and populated the validated/tranformed data in Hive.
- Used Spark SQL Data Frame API and Aggregate functions extensively for transforming the raw data to meaningful data for Visualization.
- Providing technical assistance to the team and managing team to achieve the goals in each sprint.
- Implemented various optimizations in spark application to improve the performance - Tuning memory, executors, and cores.
- Used Autosys integration for running the jobs in cluster
- Worked on CI-CD in order to facilitate seemless integration and deployment; achieved the goals using Github, Team-City and a control framework called WorkFlow Tables.
- Worked on an Agile methodology and have involved in daily standups, technical discussions with bussiness counterparts, sprint planning, scrum meetings, and have adhered to Agile principles and have delivered quality code.
Confidential
Solution Designer and Senior Developer
RESPONSIBILITIES:
- Worked as Technical Lead for CBA squad for technical implementation of Architecture Design based on the ARCA Document along with Architect and Solution Designer.
- Gathered business requirement from bussiness stake holders to determine the feasibility and to convert them to technical tasks in the design document for each product involved in the Retail Banking.
- Developed parameteric based sqoop template exclusively for creation of Parquet tables in Hive from SQL server.
- Developed Scala scripts, UDFs using both Data frames/Spark SQL and RDD in Spark by reading and writing data back into Hive tables
- Developed Automated system for picking up the files and creation of tables in scala and spark which reduces cycle time for business/clients, which is been used as a template for entire Analytics team.
- Used transformation logics in Spark with dataframes handling structured data & Semi structured data.
- Involved in moving the files which are processed to archive directory located in Amazon S3 bucket.
- Created datasets for various time frames for all data sources and loaded it in the target Hive tables.
- Created many UDF’s in scala to avoid repeated transformations to be called in the different code in consideration of design timings.
- Have integrated the tableau dashboards with hive tables for auto refresh, once the data load is completed.
- Used Autosys integration for running the jobs in cluster
- Worked on CI-CD in order to facilitate seemless integration and deployment; achieved the goals using Github, Team-City and a control framework called Houston.
- Worked on an Agile methodology and have involved in daily standups, technical discussions with bussiness counterparts, sprint planning, scrum meetings, and have adhered to Agile principles and have delivered quality code.
- Involved in Design approach, end to end coding,Peer review,unit testing and production deployment.
Confidential
Technical Lead
RESPONSIBILITIES:
- Performed data ingestion using sqoop to ingest tables from sources DB2 and Oracle
- Developed HIVE tables on top of the resultant flattened data and stored data as ORC File to enable quick read times.
- Created Hive UDTF function to extract the information from XML in a denormalized row based table format that provides all the atomic/individual data elements.
- Build Hive UDTF function so that this functionality is used for the traditional ETL and ELT type transformation approaches that feed the data to various reporting (ROLAP/MOLAP) type tools.
- Created HiveQL’s to apply the business rules, structural transformation and conforming them to refined db.
- Shell scripts were created to run the HQL’s and also to capture any errors being reported and log those situations as well as report it to the calling scripts.
- Implemented Hive partitioning and bucketing techniques as part of Code Optimization.
- Creation of Maestros for scheduling jobs in Tivoli for UAT and PROD.
- Acted as technical lead and guided the offshore team to ensure on time delivery
- Provided Design approach, end to end coding, integration, unit testing and defect fixing
- Trained QA team and Support on HIVE operations involved.
- Deployment and support all post production activities.
Confidential
ETL Developer/Tester
RESPONSIBILITIES:
- Co-ordinated with Business Analysts and source team developers for performing data analysis and gathering business requirements.
- Analyzed and estimated the work involved in the requirement gathering and preparing the design documents
- Designed the Source - Target mappings and involved in designing the Selection Criteria document and Developed technical specifications of the ETL process flow
- Work on Power Center client tools like Source Analyzer, Mapping Designer, Mapplet Designer and Transformations Developer.
- Created mappings using various Transformations like Source Qualifier, Aggregator, Expression, Filter, Router, Joiner, Lookup, Update Strategy and Sequence Generator
- Developed complex mappings such as Slowly Changing Dimensions Type II-Time stamping in the Mapping Designer.
- Used Variables and Parameters in the mappings to pass the values between mappings and session.
- Deployed reusable transformation objects such as mapplets to avoid duplication of metadata, reducing the development time.
- Designed and documented validation rules, error handling and unit test strategy of ETL process
- Tuned performance of mapping and sessions by optimizing source, target bottlenecks and implemented pass through partitioning.
- Involved in writing UNIX shell scripts to run and schedule batch jobs.
- Documented and presented the production/support documents for the components developed when handing-over the application to the production support team.