We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

San Antonio, TX

SUMMARY

  • 5+ years of IT experience in technologies like Azure, ETL tools, Machine Learning, Data Extraction, Data Modelling, Statistical Modeling, Data Mining and Data Visualization.
  • Certified Associate Azure Data Engineer.
  • Data Engineer along with the experience of using ETL tools and Machine Learning and who is passionate in implementing and exploring ML techniques.
  • Implemented ETL mapping for data collection from various data feeds.
  • Extensive experience in Data warehousing projects by implementing Talend, ETL, Developed mappings to populate data into dimensions and fact tables.
  • Skilled in designing and implementing ETL Architecture for cost effective and efficient environment. Experience in providing ETL solutions for any type of business model.
  • Hands - on experience in Azure Cloud Services (PaaS & IaaS), Azure Synapse Analytics, SQL Azure, Data Factory, Azure Analysis services, Application Insights, Azure Monitoring, Key Vault, Azure Data Lake.
  • Have good experience designing cloud-based solutions in Azure by creating Azure SQL database, setting up Elastic pool jobs and design tabular models in Azure analysis services.
  • Have extensive experience in creating pipeline jobs, schedule triggers using Azure data factory.
  • Developed Scala applications on Hadoop and Spark SQL for high - volume and real-time data processing.
  • Good understanding of Classic Hadoop and Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, Resource Manager, Node Manager, Application Master and Containers.
  • Develop batch processing solutions by using Data Factory and Azure Data bricks. Implement Azure Data bricks clusters, notebooks, jobs and auto scaling.
  • Knowledge on implementing Data Cleaning, Data Validation, Data Mapping, Data Analysis and Data Profiling, features scaling, features engineering, statistical modeling, testing and validation and data visualization.
  • Have extensive experience in creating pipeline jobs, schedule triggers using Azure data factory.
  • Expertise in transforming business resources and requirements into manageable data formats and analytical models, designing algorithms, building models, developing data mining and reporting solutions which scale across a massive volume of structured and unstructured data.
  • Proficient in Machine Learning algorithm and Predictive Modeling including Regression models, Decision Tree, Random Forests, Sentiment Analysis, Naïve Bayes Classifier, SVM, Ensemble Models.
  • Experience utilizing GitHub in the Machine Learning pipeline for best practice code management and collaboration.
  • Knowledge on Natural Language Processing (NLP) algorithm and Text Mining.
  • Hands on experience with different programming languages such as Java, Python, R.
  • Good team player with the ability to perform individually, good interpersonal relations, strong communication skills, hardworking and high level of motivation.
  • Strong business sense and abilities to communicate data insights to both technical and nontechnical clients.
  • Proficiency in analyzing problems and transferring business concept to functional requirements.

TECHNICAL SKILLS

Languages: R, SQL, Python, Shell scripting, pysprak

IDE: R Studio, Jupyter Notebook, Pycharm, Spyder, Atom, Amazon sagemaker, Azure ML Studio.

Databases: Oracle 11g, SQL Server, MySQL, MongoDB, PL/SQL, ETL.

Big Data Ecosystem: Hadooop, MapReduce, HDFS, HBase, Hive, Pig, Impala, Spark MLLib, PySpark, sqoop, CUDA-NVIDIA.

Operating system: Window XP/7/8/10, Ubuntu, Unix, Linux.

Packages: R, Word cloud, Neural net, CHATBOT, NLP, pandas, NumPy, seaborn, SciPy, matplotlib, scikit-learn, Tensorflow, Pytorch, CNN, RNN.

Web Technologies: HTML, CSS

Data Analytics Tools: R console, Python (NumPy, pandas, SciKit-learn, SciPy)

Visualization: Tableau, Informatica, DATABRICKS.

PROFESSIONAL EXPERIENCE

Confidential, San Antonio, TX

Data Engineer

Responsibilities:

  • Migrated an existing on-premises application to AWS. Used AWS service S3 for small data sets processing and storage.
  • Responsible for estimating the cluster size, monitoring and troubleshooting of the Spark databricks cluster.
  • Implemented Spark using Python and Spark SQL for faster testing and processing of data.
  • Worked on scheduling all jobs using Airflow scripts using python added different tasks to DAG.
  • Experienced in triggering jobs in Airflow.
  • Developed Logging Framework for the state modules in SPLUNK logging framework.
  • Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into pyspark.
  • Integrated Jenkins to do auto build when code is pushed to GIT.
  • I have worked on various tools including Airflow, Jenkins, Databricks, AWS S3 and etc.
  • Used Jira and rally for ticketing and tracking issues and Jenkins for continuous integration and continuous deployment.
  • Worked with data owners, Business Units, Data Integration team and customers in fast paced Agile/Scrum environment.

Environment: AWS S3, AWS cloudwatch, Jenkins, Airflow, Teradata, Databricks, Python 3.2, Pyspark, Jira, Rally

Confidential, Austin, TX

Azure Data Engineer

Responsibilities:

  • Was leading a Connected Services team that extracts insights from connected devices in homes, energy consumption data, and user behavior data.
  • Responsible for leveraging Confidential 's data from multiple sources to drive actionable insights to our Pros and Homeowners.
  • Developed a classification model to detect the different types of customers interested in our goods and productized the same using Azure machine learning.
  • Collaborated with various business and technical teams to gather requirements around data quality rules and proposed the optimization of these rules if applicable, then designed and developed these rules with IDQ.
  • Utilized Informatica Developer, Informatica Power Exchange, Informatica Metadata Manager, and Informatica Analyst to design and develop custom objects and rules, reference data tables, and create/import/export mappings.
  • Responsible for a Data Communication Module planning and Proof of Concept activity.
  • Defined database architecture to handle forecasted data volume and transaction throughput.
  • Built a data platform to enable analytic and data science services.
  • Collaborated with the engineering team to bring analytical prototypes to production.
  • Prepared plans for all ETL procedures and architectures using Informatica.
  • Experienced in implementing data solutions in Azure including Azure SQL, Azure Synapse, Cosmos DB, Databricks.
  • Worked on querying data using Spark, SQL on top of spark engine.
  • Designeddata solutions in Azure including data distributions and partitions, scalability, disaster recovery, and high availability.
  • Monitored and Optimizeddata solutions in Azure including using Azure Monitor.
  • Managed, monitored, and ensured the security and privacy of data to satisfy business needs.

Environment: SQL Server Management, Azure Data Factory, Azure Data Lake Analytics, Azure Analysis Services, ETL, Databricks, Informatica, Machine Learning.

Confidential, San Antonio, TX

Data Engineer

Responsibilities:

  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics.
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
  • Maintained database files, tables, and dictionaries to prevent data loss.
  • Identified different types of relevant data to improve business performance with better analytics. Used data to create charts and reports highlighting different findings.
  • Worked inlarge scale databaseenvironment likeHadoopandMapReduce, with workingmechanismof Hadoop clusters, nodes andHadoop Distributed File System (HDFS).
  • Responsible for estimating the cluster size, monitoring and troubleshooting of the Hadoop cluster.
  • Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats for analyzing & transforming the data to uncover insights into the customer usage patterns.
  • Moved the mappings from development environment to test environment.
  • Designed ETL Process using Informatica to load data from Flat Files and Excel Files to targe Oracle Data Warehouse database.
  • Improving workflow performance by shifting filters as close as possible to the source and selecting tables with fewer rows as the master during joins.
  • Used connected and unconnected lookups whenever appropriate along with the use of appropriate caches.
  • Created tasks and workflow manager and monitored the sessions in the workflow monitor.
  • Set up permissions for groups and users in all development environments.

Environment: Azure Data Factory, HDFS, PySpark, Oracle, Spark SQL, Azure Data Lake, Azure Data Storage, Informatica, Hive.

Confidential

Data Analyst / Data Scientist

Responsibilities:

  • Used Python 3.X (numpy, scipy, pandas, scikit-learn, seaborn) and Spark 2.0(PySpark, MLlib) to develop variety of models and algorithms for analytic purposes.
  • Worked along with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
  • Performed univariate and multivariate analysis on data to identify any underlying pattern in the data and associations between the variables.
  • Performed data imputation using Scikit-learn package in Python.
  • Analyzed project-related problems and created innovative solutions involving technology, analytic methodologies, and advanced solution components.
  • Used Excel and Tableau to analyze the health care website data and build informative reports.
  • Developed and implemented predictive models using machine learning algorithms such as learning regression, classification, multivariate regression, Naïve Bayes, Random Forests, K-means clustering, KNN, PCA and regularization for data analysis.
  • Implemented NLP techniques to Optimized Customer Satisfaction.
  • Developed and implemented predictive models using machine learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, Random Forests, K-means clustering, KNN, PCA and regularization for data analysis.
  • Built regression models include: Lasso, Ridge, SVR, XGboost to predict Customer Life Time Value.
  • Built classification models include: Logistic Regression, SVM, Decision Tree, Random Forest to predict Customer Churn Rate.
  • Used F-Score, AUC/ROC, Confusion Matrix, MAE, RMSE to evaluate different Model performance.
  • Applied clustering algorithms i.e. Hierarchical, K-means using Scikit and Scipy.

Environment: Python 2.x, NLP, R, Machine Learning (Regressions, KNN, SVM, Decision Tree, Random Forest, XGboost, LightGBM, Collaborative filtering, Ensemble), pandas, numpy.

Confidential

Python Developer

Responsibilities:

  • Worked on the project from gathering requirements to developing the entire application. Worked on Anaconda Python Environment.
  • Created, activated and programmed in Anaconda environment. Wrote programs for performance calculations using NumPy.
  • Designed python routines to log into the websites and fetch data for selected options.
  • Developed different Statistical Machine Learning, Data mining solutions to various business problems and generating data visualizations using R, Python and Tableau
  • Worked on development of SQL and stored procedures on MYSQL. Wrote and executed various MySQL database queries from Python MySQL connector and MySQL Db package.
  • Analyzed the code completely and have reduced the code redundancy to the optimal level.
  • Design and build a text classification application using different text classification models.
  • Worked on writing and as well as read data from CSV and excel file formats.
  • Responsible for designing, developing, testing, deploying and maintaining the web application.

Environment: Python 2.x, Anaconda, Sypder (IDE), Tableau, python libraries such as NumPy, SQL Alchemy, MySQLdb.

We'd love your feedback!