Sr. Data Engineer Resume
3.00/5 (Submit Your Rating)
Boston, MA
SUMMARY
- 7+ years of technical experience as a Data engineer/Modelling business need of clients, developing effective and efficient solutions, and ensuring client deliverables within committed timelines.
- Experience in using build tools like Maven, SBT, Version control tools likeGitHub, CI/CD tools like Jenkins, and Agile project management tools like JIRA.
- Maintenance using big data technologies like HDFS, Map Reduce, Hive, Python,Sqoop,Airflow, HBase, Spark and Python.
- Good understanding of Data Modeling (Dimensional and Relational) concepts like Star - Schema Modeling, Snowflake Schema Modeling, Fact and Dimension Tables.
- Strong experience as a BI Consultant in Data Warehousing Environment that includes Analysis,DataModeling, Design, Development, Administration,DataMigration, Testing, support and Maintenance using MSBI Tools (SQL Server, SSIS SSAS, SSRS) along withPowerBI.
- Experience in Data Warehouse,Data Modelling,Data Mart,Data visualization,Reporting,DataQuality,Datavirtualization andDataScience Solutions.
- Experience working with varied forms of data infrastructure inclusive of relational databases such as SQL, Hadoop, Spark, and column-oriented databases such as MySQL.
- Experience in utilizing AWS Cloud Services - S3, EMR, EC2, Redshift, Athena,Glue Metastore and Step Functions.
- Have good exposure with the star,snowflake schema,datamodelling and work with different data warehouse projects.
- Experience with Devops tools like Jenkins for continuous integration, Gitlab as a repository and chef for automation of system level patching and upgrades.
- Expertise in using Cloud based managed services fordatawarehousing in Amazon web services (AWS) and Azure (AzureDataLake Storage, AzureDataFactory).
- Expertise in creating Spark Applications using Python (PySpark) and Scala.
- Strong Experience in working with Linux/Unix environments, writing Shell Scripts.
- Experienced in Microsoft Business Intelligence tools, developingSSIS(Integration Service), SSAS (Analysis Service) and SSRS (Reporting Service), building Key Performance Indicators, and OLAP cubes.
- Experienced in pythondatamanipulation for loading and extraction as well as with python libraries such as NumPy, SciPy and Pandas fordataanalysis and numerical computations
- Extensive working experience with Python including Scikit-learn, SciPy, Pandas, and NumPy developing machine learning models, manipulating, and handlingdata
- Strong experience with bigdataprocessing using Hadoop technologies Map Reduce, Apache Spark, Apache Hive and Pig.
- Experience ofDataWarehousing ETL concepts using Informatica Power Center,OLAP, OLTP.
- Extensive hands-on experience in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, EMR, and Elastic search), Hadoop, Python, Spark and effective use of Azure SQL Database, MapReduce, Hive, SQL and PySpark to solve bigdatatype problems.
- Excellent documentation skills including technical writing, Visio, PowerPoint, flowcharting.
- Good at conceptualizing and building solutions quickly and recently developed aDataLake using sub-pub Architecture.
- Experience on MS SQL Server, including SSRS, SSIS, and T-SQL.
- Experience withdatatransformations utilizing SnowSQL inSnowflake.
- Extensively used SQL, NumPy, Pandas, Scikit-learn, Spark, Hive forDataAnalysis and Model building.
- Skillful inDataAnalysis using SQL on Oracle, MS SQL Server, DB2, Teradata and AWS.
- Experienced in trouble shooting ETL jobs, Data warehouse,Datamartand datastore models.
PROFESSIONAL EXPERIENCE
Sr. Data Engineer
Confidential, BOSTON, MA
Responsibilities:
- Developed complexTalendETL jobs to migrate thedatafrom flat files to database. Pulled files from mainframe intoTalendexecution server using multiple ftp components.
- Scheduling Oracle jobs and implementing dependencies using OracleScheduler.
- Optimize the Pyspark jobs to run on Kubernetes Cluster for fasterdataprocessing.
- Developed spark streaming applications to consume the streaming json messages from Kafka topics.
- Ability to processing using Hadoop technologies such as MapReduce, Hive, Pig andApache Spark and also good knowledge of cloud technologies like Amazon WebServices (AWS).
- Built and maintainedETLSaaS platforms using Python.
- Involved in Software Development Life Cycle using various methodologies likeWaterfall, Agile and Scrum.
- Using NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn, and TensorFlow inPython for implementing various machine learning algorithms.
- Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy.
- Used AWSdatapipeline forDataExtraction, Transformation and Loading from homogeneous or heterogeneousdatasources & built various graphs for business decision-making using Python matplotlib library.
- Leverage business intelligence tools likePowerBIto publish and present KPI driven dashboards for critical business processes.
- Rebuilt custom sales / commission reporting mechanism, usingMatillionETL + Snowflake to replace an unreliable Talend / MySQL system.
- Used AWS glue catalog with crawler to get thedatafrom S3 and perform SQL query operations and JSON schema to define table and column mapping from S3datatoRedshift.
- Build, Enhance, optimized Data Pipelines using Reusable frameworks to support data need for the analytics and Business team using Spark andAirflow
- Build a program with Python and Apache beam and execute it in cloud Dataflow to runDatavalidation between raw source file and big query tables.
- Worked with Oracleschedulerfor scheduling different processes.
- Created continuous integration and continuous delivery (CI/CD) pipeline on Azure that helps to automate steps in the software delivery process.
- LoadingdataintoSnowflaketables from internal stage using SnowSQL
- Worked asDevOpsto automate application deployments using Jenkins/GIT.
- Designed and developed SSIS Packages to import and exportdatafrom MS Excel,SQL Server and Flat files.
- Implemented Responsible AWS solutions using EC2, S3, RDS, EBS, Elastic LoadBalancer, and Auto scaling groups, Optimized volumes and EC2 instances.
- Involved in Software Development Life Cycle using various methodologies likeWaterfall, Agile and Scrum.
- Using NumPy, Pandas, Matplotlib, SciPy, Scikit-learn, Seaborn, and TensorFlow inPython for implementing various machine learning algorithms.
- Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy.
- Used AWSdatapipeline forDataExtraction, Transformation and Loading from homogeneous or heterogeneousdatasources & built various graphs for business decision-making using Python matplotlib library.
- Leverage business intelligence tools likePowerBIto publish and present KPI driven dashboards for critical business processes.
- Designing and maintaining ETL packages viaMatillionETL for Redshift.
- Worked on Bigdataon AWS and AWS connect glue cloud services i.e., EC2, S3, EMR and DynamoDB.
- Involved in Development and Created PL/SQL stored procedures, functions.
- Developing ETL applications on large volumes ofdatausing different tools: MapReduce, Spark-Scala, PySpark, Spark-Sql, and Pig.
- Stage the API or KafkaData (in JSON file format) intoSnowflakeDB byflattening the same for different functional services.
- Implement AWS CloudWatch on the productionSSISserver to provide a single view of custom script errors in multiple directories.
- Building a Scala and spark based configurable framework to connect commonDatasources like MYSQL, Oracle, Postgres, SQL Server, Salesforce, and Big Query and load it in BigQuery.
- CreatedETLpipelines using AWS glue, lambda and EMR to processdatafrom various sources and stored finaldatain Redshift.
- Used MySQL as backend database and MySQL dB of python as database connector to interact with MS SQL server.
- Involved in buildingDataModels and Dimensional Modeling with 3NF, Star andSnowflake schemas for OLAP and Operationaldatastore (ODS) applications.
- Performed end-to-end Architecture and implementation assessment of various AWS services like Amazon EMR,Redshift, S3, Athena, Glue, and Kinesis.
- DevelopedPowerBIreports and dashboards from multipledatasources usingdatablending.
- Created and maintained documentation, data locations/sources and data models, ETL processes and associated code in Python and Spark. CreatedAirflowScheduling scripts in Python.
- Develop under scrum methodology and in aCI/CDenvironment using Jenkins.
- PerformedDevOpsduties to create and improve data infrastructure capabilities.
Data Engineer
Confidential, Milwaukee, WI
Responsibilities:
- Worked with Python and R Libraries like NumPy, Pandas, Scikit-Learn, SciPy andTensorFlow.
- Developed Power BI reports and dashboards from multipledatasources usingdatablending.
- Extensively worked onDataServices for migratingdatafrom one database to another database.
- Heavily involved in testingSnowflaketo understand best possible way to use the cloud resources.
- Involved in continuous integration and deployment (CI/CD) using DevOps tools like Looper, Concord
- Developed Spark Applications using Scala and Python to implement variousdatacleansing/validation and processing activity of large-scale datasets ingested from traditionaldatawarehouse systems.
- I have gained valuable experience working and learning AWS Cloud services, likeEC2, S3, RDS, Redshift, Athena, and Glue.
- Used the ETLDataStage Director to schedule and running the jobs, testing, and debugging its components & monitoring performance statistics.
- Created Python / SQL scripts, to transform Databricks notebooks from Redshift table into Snowflake S3 buckets
- Loadingdatafrom different sources to adatawarehouse to perform somedataaggregations for business Intelligence using python.
- Develop adataplatform from scratch and took part in requirement gathering and analysis phase of the project in documenting the business requirements.
- Worked in designing tables in Hive, MYSQL using SQOOP and processingdatalike importing and exporting of databases to the HDFS, involved in processing large datasets of different forms including structured, semi-structured and unstructureddata.
- Designed and developedPowerBIgraphical and visualization solutions with business requirement documents and plans for creating interactive dashboards.
- Design, build, and support T-SQL,SSIS, SSRS, SSAS, anddatawarehouse solutions for customers in various industries.
- Worked on migrating Log Parser from C# and MySQL toPySparkand Spark SQL onDatabricks (AWS).
- PerformedDevOpsduties to create and improvedatainfrastructure capabilities.
- Worked withSnowflakeexternal functions to trigger AWS Lambda functions based on asnowflakeevent.
- Migrated ETL code from Talend to Informatica. Involved in development, testing and postproduction for the entire migration project.
- Used Spark Streaming to receive real timedatafrom the Kafka and store the streamdatato HDFS using Python and NoSQL databases such as HBase and Cassandra.
- Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation.
- UtilizedPowerBI(PowerView) to create various analytical dashboards that depicts critical KPIs such as legal case matter, billing hours and case proceedings along with slicers and dicers enabling end-user to make filters.
- AssistedDataScientists in the ETL processes, includingSSISpackages.
- Created and Developed Notebooks (usingPySparkand Spark SQL) and published dashboards with desired charts and tables.
- Worked extensively on AWS Components such as Airflow, Elastic Map Reduce(EMR), Athena,and Snowflake.
- Proven track record in troubleshooting ofDataStage jobs and addressing production issues like performance tuning and enhancement.
- Scheduled the jobs usingAirflowand also usedairflowhooks to connect to various traditional databases like db2, oracle and Teradata.
ETL Developer
Confidential, San Antonio, TX
Responsibilities:
- UsedSSISto create ETL package to Validate, Extract, Transform and Load to various DataMart's (MS SQL).
- Created Sqoop scripts to ingestdatafrom HDFS to Teradata and from SQL Server to HDFS and to PostgreSQL.
- Installing and monitoring PostgreSQL database using the standard monitoring tools like Nagios etc.
- Analyzed existing systems and propose improvements in processes and systems for usage of modern scheduling tools like Airflow and migrating the legacy systems into an Enterprisedatalake built on Azure Cloud.
- Provide End User support to all Confidential End / Confidential Users via phone, email, and jabber, in person or remotely.
- Extensively worked on UNIX Shell Scripting for splitting group of files to various small files and file transfer automation.
- Created Sqoop scripts to import/export user profiledatafrom RDBMS to S3Data Lake.
- Developed various spark applications using Scala to perform various enrichments of user behavioraldata(click streamdata) merged with user profiledata.
- Involved indatacleansing, event enrichment,dataaggregation, de-normalization anddatapreparation needed for downstream model learning and reporting.Worked with Autosys scheduler for scheduling different processes.
- Assisted in UAT Testing and provided necessary reports to the business users.
- Ingested gigabytes of click streamdatafrom external servers such as FTP server and S3 buckets on daily basis using custom Input Adapters.
- Gathered business requirements and prepared technical design documents, target to source mapping document, mapping specification document.
- Parsed complex files through InformaticaDataTransformations and loaded it toDatabase.
- Optimized query performance by oracle hints, forcing indexes, working with constraint-based loading and few other approaches.
- Building and maintain SQL scripts, indexes, and complex queries fordataanalysis and extraction.
- Automate the configuration management of database and BigDatasystems.
- UsedPySpark, Map Reduce technologies for faster runtimes.
SOFTWARE DEVELOPER
Confidential, Lakeland, FL
Responsibilities:
- Worked on writing Constraints, Indexes, Views, Stored Procedures, Cursors, and Triggers.
- Designed and developed data management system using MySQL.
- Used JQuery for all client-side JavaScript manipulation.
- Created unit test/regression test framework for working/new code.
- Wrote Python scripts to parse XML documents and load the data in database.
- Interacted and discussed about the requirements of clients during requirement gathering sessions.
- Worked with team of developers on Python applications for RISK management.
- Used Agile - Scrum methodology for the software development life cycle of the project.
- Developed entire frontend and backend modules using Python on Django Web Framework.
- Utilized Python's OS module for forking and cloning projects on UNIX environment.
- Used Subversion control tool to coordinate team-development.
- Developed tools to automate some base tasks using Shell Scripting, Python.
- Modified SQL, PL/SQL procedures and triggers to obtain optimize output.
- Analyzed Performance test results, and prepared detailed Performance Test Reports including recommendations.