We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

3.00/5 (Submit Your Rating)

SUMMARY

  • A Data enthusiast with 11 plus years of experience executing data - driven solution to increase efficiency, accuracy, and utility of internal data processing. Experienced in building AI and Machine Learning models and providing data conversion solution to move data from heterogenous system to target system, and analysing data mining algorithms to deliver insights and implement action-oriented solutions to complex business problems.
  • Experience in Data warehousing & Data Conversion projects as an ETL consultant/ Analyst, Team Leader, and module leader.
  • Data analysis and transformations in Python using PySpark, Pandas, NumPy, Seaborn, Matplotlib.
  • Experienced handling different file formats like (CSV, JSON, PARQUET, TXT, XML, ORC, Avro).
  • Experience building different machine learning models (Classification, Regression, ANN, CNN, NLP, Clustering, Decision Trees, K-Means Clustering)
  • Experience building api using Flask framework and deploy model in Client server architecture model.
  • Experience building AIML models in AWS Sage maker and deploy the model.
  • Experience building ANN and CNN models using TensorFlow and Keras libraries.
  • Experience building Sentiment Analysis Model using NLP.
  • End to End implementation in ETL tools Spark, Oracle Data integrator, Pentaho and Netezza DB.
  • Experience building mapping and workflows using Informatica BDM tool.
  • Possess good knowledge in Pentaho for creating transformations and Jobs.
  • Certified IBM Solution developer in DataStage v8.5 & v9.1 & work experience in Pentaho, IBM Infosphere DataStage 9.1 and Oracle Data Integrator ODI.
  • Experience in Onsite-Offshore project execution model.
  • Experience in Agile development and waterfall model.
  • Experience in Scrum Master Role, conducting Scrum daily stand-up meeting, story creation and maintenance throughout the sprint.
  • Attending scrum of scrum and maintain the PI level stats and publish it to stake holders.
  • Extensively used ETL methodology for supporting data extraction, transformation and loading process.
  • Experience in creating the Source-To-Target Data mapping Document based on the business requirement.
  • Excellent team player with good organizational, communication, interpersonal & analytical skills.
  • Possess strong ability to quickly adapt to new applications, tools, platforms and languages.

TECHNICAL SKILLS

Operating Systems: Windows NT, UNIX

Languages: Python, Java, plsql, Unix Shell Scripting and SQL, PL/SQL

Framework: Pandas, Numpy, TenserFlow, Keras, Matplotlib, Seaborn, Selenium Web Driver, Selenium Grid, Junit, Cucumber

Databases: PostgreSQL, HIVE, Netezza, Teradata, DB2, SQL Server 6.5/7/2000, Oracle 10g, 11g, 12c

Tools: Spark, Pentaho PDI 7.x, Oracle Data Integrator ODI, Data Stage, Informatica BDM, JIRA

File System: HDFS

CI/CD: Jenkins, Maven, Stash, Bitbucket, GIT

Project Management: Version One

IDE: Jupyter, Pycharm, Toad, Dbeaver, Putty,Control-M, Eclipse

PROFESSIONAL EXPERIENCE

Confidential

Sr. Data Engineer

Responsibilities:

  • Creating spark script to read data from HDFS and load to target files and Postgres DB.
  • Creating data frame from HDFS files to perform transformation and load data from source to target.
  • Creating and accessing HIVE tables to perform transformation in informatica mapping.
  • Monitoring spark jobs using Yarn application.
  • Experienced using Hue to interact with HIVE tables.
  • Data analysis and transformations in Python using PySpark, Pandas, NumPy, Matplotlib.
  • Processing data from different file formats like CSV, Parquet, txt, JSON, xml
  • Developing informatica mapping using informatica BDM (Big Data Manager)
  • Developing adhoc sql based on the business request to validate the source data.
  • Creating Sqoop scripts to import and export data from HDFS data lake.
  • Monitoring data loads periodically and make sure the data quality.
  • Developed PostgreSQL plsql scripts to load the data into Target Database.
  • Developed Unit test case documents for code move to Testing environment.
  • Developed implementation plan for code deployment.

Solution Environment: Hadoop, Python, Spark, HIVE, YARN, HUE, Sqoop, PostgreSQL, Pandas, Jupyter Notebook, Informatica BDM, Unix, Stash, Bitbucket, TOAD, ECLIPSE, Java, Git, Jenkins, maven, Control-M

Confidential

Lead Engineer

Responsibilities:

  • Create spark script to perform data analytics and load to Target Database.
  • Using Spark, read data from different source and transform based on business requirement.
  • Experience handling claim request and processing files.
  • Analysis of ETL mapping document.
  • Preparation of design document to load base table from source to Target tables.
  • Preparation of design document to load dimension and fact tables based on ETL mapping document.
  • Work with Product Owner to understand the requirement.
  • Work with Data modelling team to design the LLD and HLD.
  • Attended Sprint planning meeting to prioritize the story for current sprint based on the capacity.
  • Developed Pentaho Trans and Jobs for the ETL mapping document
  • Developed framework reusable job to capture the execution log.
  • Extract data from Json file and load into file and Tables.
  • Developed Unit test case documents for code move to Testing environment.
  • Developed implementation plan for code deployment.
  • Production deployment and postproduction support.

Solution Environment: Python, Spark, Pentaho 7.x, Postgresql, Unix, Stash, BitBucket, TOAD, ECLIPSE, Java, Git, Jenkins, maven, Hadoop, HIVE, Sqoop, Jupyter

We'd love your feedback!