Sr. Data Engineer Resume
SUMMARY
- A Data enthusiast with 11 plus years of experience executing data - driven solution to increase efficiency, accuracy, and utility of internal data processing. Experienced in building AI and Machine Learning models and providing data conversion solution to move data from heterogenous system to target system, and analysing data mining algorithms to deliver insights and implement action-oriented solutions to complex business problems.
- Experience in Data warehousing & Data Conversion projects as an ETL consultant/ Analyst, Team Leader, and module leader.
- Data analysis and transformations in Python using PySpark, Pandas, NumPy, Seaborn, Matplotlib.
- Experienced handling different file formats like (CSV, JSON, PARQUET, TXT, XML, ORC, Avro).
- Experience building different machine learning models (Classification, Regression, ANN, CNN, NLP, Clustering, Decision Trees, K-Means Clustering)
- Experience building api using Flask framework and deploy model in Client server architecture model.
- Experience building AIML models in AWS Sage maker and deploy the model.
- Experience building ANN and CNN models using TensorFlow and Keras libraries.
- Experience building Sentiment Analysis Model using NLP.
- End to End implementation in ETL tools Spark, Oracle Data integrator, Pentaho and Netezza DB.
- Experience building mapping and workflows using Informatica BDM tool.
- Possess good knowledge in Pentaho for creating transformations and Jobs.
- Certified IBM Solution developer in DataStage v8.5 & v9.1 & work experience in Pentaho, IBM Infosphere DataStage 9.1 and Oracle Data Integrator ODI.
- Experience in Onsite-Offshore project execution model.
- Experience in Agile development and waterfall model.
- Experience in Scrum Master Role, conducting Scrum daily stand-up meeting, story creation and maintenance throughout the sprint.
- Attending scrum of scrum and maintain the PI level stats and publish it to stake holders.
- Extensively used ETL methodology for supporting data extraction, transformation and loading process.
- Experience in creating the Source-To-Target Data mapping Document based on the business requirement.
- Excellent team player with good organizational, communication, interpersonal & analytical skills.
- Possess strong ability to quickly adapt to new applications, tools, platforms and languages.
TECHNICAL SKILLS
Operating Systems: Windows NT, UNIX
Languages: Python, Java, plsql, Unix Shell Scripting and SQL, PL/SQL
Framework: Pandas, Numpy, TenserFlow, Keras, Matplotlib, Seaborn, Selenium Web Driver, Selenium Grid, Junit, Cucumber
Databases: PostgreSQL, HIVE, Netezza, Teradata, DB2, SQL Server 6.5/7/2000, Oracle 10g, 11g, 12c
Tools: Spark, Pentaho PDI 7.x, Oracle Data Integrator ODI, Data Stage, Informatica BDM, JIRA
File System: HDFS
CI/CD: Jenkins, Maven, Stash, Bitbucket, GIT
Project Management: Version One
IDE: Jupyter, Pycharm, Toad, Dbeaver, Putty,Control-M, Eclipse
PROFESSIONAL EXPERIENCE
Confidential
Sr. Data Engineer
Responsibilities:
- Creating spark script to read data from HDFS and load to target files and Postgres DB.
- Creating data frame from HDFS files to perform transformation and load data from source to target.
- Creating and accessing HIVE tables to perform transformation in informatica mapping.
- Monitoring spark jobs using Yarn application.
- Experienced using Hue to interact with HIVE tables.
- Data analysis and transformations in Python using PySpark, Pandas, NumPy, Matplotlib.
- Processing data from different file formats like CSV, Parquet, txt, JSON, xml
- Developing informatica mapping using informatica BDM (Big Data Manager)
- Developing adhoc sql based on the business request to validate the source data.
- Creating Sqoop scripts to import and export data from HDFS data lake.
- Monitoring data loads periodically and make sure the data quality.
- Developed PostgreSQL plsql scripts to load the data into Target Database.
- Developed Unit test case documents for code move to Testing environment.
- Developed implementation plan for code deployment.
Solution Environment: Hadoop, Python, Spark, HIVE, YARN, HUE, Sqoop, PostgreSQL, Pandas, Jupyter Notebook, Informatica BDM, Unix, Stash, Bitbucket, TOAD, ECLIPSE, Java, Git, Jenkins, maven, Control-M
Confidential
Lead Engineer
Responsibilities:
- Create spark script to perform data analytics and load to Target Database.
- Using Spark, read data from different source and transform based on business requirement.
- Experience handling claim request and processing files.
- Analysis of ETL mapping document.
- Preparation of design document to load base table from source to Target tables.
- Preparation of design document to load dimension and fact tables based on ETL mapping document.
- Work with Product Owner to understand the requirement.
- Work with Data modelling team to design the LLD and HLD.
- Attended Sprint planning meeting to prioritize the story for current sprint based on the capacity.
- Developed Pentaho Trans and Jobs for the ETL mapping document
- Developed framework reusable job to capture the execution log.
- Extract data from Json file and load into file and Tables.
- Developed Unit test case documents for code move to Testing environment.
- Developed implementation plan for code deployment.
- Production deployment and postproduction support.
Solution Environment: Python, Spark, Pentaho 7.x, Postgresql, Unix, Stash, BitBucket, TOAD, ECLIPSE, Java, Git, Jenkins, maven, Hadoop, HIVE, Sqoop, Jupyter
