We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

0/5 (Submit Your Rating)

RichmonD

SUMMARY

  • Meticulous Data Scientist accomplished in compiling, transforming, and analyzing complex information through software. Expert in machine learning and large dataset management.
  • Demonstrated success in identifying relationships and building solutions to business problems.

TECHNICAL SKILLS

Programming Languages: Java, Python, SQL, T - SQL, Java, HTML, CSS, Java script, R programming.

Databases: Oracle, Snowflake, MS Access, MySQL, RSDS Teradata Assistant, Mongo DB, Hadoop, Caspio, Apache Honeycomb

Tools: Power BI, Tableau, SQL Developer, Excel, Microsoft o ce, Juypter Note books, IBM SPSS, Jira, Microsoft Azure, AWS, Balsamic, Visio, Power designer

Version Control: GitHub, SVN, Mockito

Platforms: Windows (98, XP, 7), UNIX, LINUX, Ubuntu

Cloud Technologies: Amazon (EC2, EBS, EMR, RDS, IAM, Glue), Google cloud Path

Machine Learning: Clustering, Regression, Decision Trees, Classification, Survival Analysis etc.

PROFESSIONAL EXPERIENCE

Confidential, Richmond

Sr. Data Engineer

Responsibilities:

  • Perform data comparison between SDP (Streaming Data Platform) real-time data with AWS S3 data and Snowflake data using Databricks, Spark SQL, and Python.
  • Managed the UI/UX team for the front-end development of the project
  • Created Data mapping, Data Dictionary for ETL and application support, metadata, DML as required
  • Responsible for extracting, transforming, and loading (ETL) data from various sources into the organization's data systems. This involves working with different data formats, APIs, databases, and data integration tools to ensure data accuracy and consistency.
  • Performs reinsurance transactions, analysis, reconciliations, and research with knowledge of GAAP/ STAT accounting and other regulatory requirements
  • Built tools using Tableau to allow internal and external teams to visualize and extract insights from big data platforms.
  • Responsible for expanding and optimizing data and data pipeline architecture, as well as optimizing data flow and collection for cross-functional teams.
  • Build best-practice ETLs with Apache Spark to load and transform raw data into easy-to-use dimensional data for self-service reporting.
  • Develop an in-time peak/valley alert monitoring system by Teradata, SQL, and python. Design the whole logic of the statistician model, matrix setting, threshold algorithm design, and auto-trigger email-sending system.
  • Built a series of functions to clean, repopulate, and pivot the original database in Teradata by SQL queries.
  • Hyperparameter in algorithm function to optimize the dataflow job and make the product meet requirements from the inner customers.
  • Design the interface and content in HTML for the sample alert email, including a table containing key information, a dashboard chart designed by python in an attachment, and a link to the terminal dashboard in an internal platform.
  • Built compliance and data quality checks pipeline using airflow, SQL, Teradata, and cloud functions.
  • Contribute to the technical architecture design, documentation, and implementation.
  • Followed the Agile development with Jira as a development management and issue-tracking tool. Created a confluence page for developing best practices and project documentation.
  • Automate the process to send the data quality alerts to slack channel and email using Databricks, Python, and HTML. This will alert users if there are any issues with data.

Environment: Power BI, R, Python, AWS, RSDS Teradata SQL Assistant, Google cloud path, MySQL, XML, Postman, Outlook, Spark 2.4, Spark SQL, Kafka 2.3.0, Apache Airflow 1.10.4, Snowflake, Databricks

Confidential

Insights and Data Engineer

Responsibilities:

  • Followed Agile Software Development Methodology to build the application iteratively and incrementally. Participated in scrum related activities and daily scrum meetings.
  • Monitor Reinsurance receivable balance and work with ETPR’s on claim reimbursement
  • Research and prepare any claim refunds to the ETPR’s
  • Provide various reports to the ETPR’s on a weekly/monthly/quarterly basis
  • Worked with the team of "Group technology”. Worked together for the removal of Risk dependencies and done analysis to find out what are systems and applications involved with it
  • Worked on customer clustering based on Machine learning and statistical modelling effort includes building predictive models
  • Written Python Scripts to parse JSON documents and load the data into database. worked on a Google cloud path to train and set the datasets and with the help of big query created a pipeline
  • Installed, configured, and hosted the Tomcat app servers and MySQL database servers on physical servers (Linux, Windows), and Amazon AWS virtual servers (Linux).
  • Documented all the data transformations, validations, downstream impacts, and API's and provided guidance to the engineers
  • Performed SQL queries using the RSDS Teradata SQL assistant in-Order to analyze the daily feeds of the banking systems
  • Used AWS services like EC2 for deployments, S3 for storage and SES, SQS for sending notifications.
  • Created a platform as infrastructure with AWS (EC2, RDS, ELB) used Jenkins to run the automated deployments.
  • Implemented business models with Tableau and Power-BI and used DAX expressions to effectively communicate the business insights
  • Designed, Build and deployed set of Python modelling APIS for customer analysis, which integrate multiple machine learning techniques for used behavior prediction
  • Performed exploratory data analysis using R programming, also involved in generating various graphs and charts for analyzing data using Phyton Libraries.
  • Used classification Techniques including Random Forest and Logical Regression to qualify the likelihood of each user referring
  • Applied machine learning algorithms with Spark standalone R/ Python

Environment: Power BI, R, Python, Tableau, AWS, RSDS Teradata SQL Assistant, Google cloud path, MySQL, XML, Postman, Outlook, Spark framework, MS Access, MS Outlook, MS Excel, Jupiter Notebooks, Oracle, Jenkins, Linux, windows.

Confidential

Data Engineer

Responsibilities:

  • Improved overall user experience through support, training, troubleshooting, improvements, and communication of system changes.
  • Performed specified data processing statistical techniques such as sampling techniques, time series, estimation, co-relation, and regression using R.
  • Applied Data mining Techniques using LR, classification and clustering.
  • Used Juypter notebooks (NumPy, seaborn, pandas, SciPy) and spark (PySpark, MLlib) to develop variety of models and algorithms for analytic purposes
  • Collaborated with the data engineers to implement ETL processing and optimized SQL queries to perform data attraction to fit the analytical requirements.
  • Utilized NLP (National language Processing) techniques to optimize customer satisfaction
  • Installed, configured, and hosted the Tomcat app servers and MySQL database servers on physical servers (Linux, Windows), and Amazon AWS virtual servers (Linux).
  • Implemented the framework to migrate Relational data to non-relational data stores and to run performance tests against different NoSQL vendors.
  • Designed rich data visualizations to model into human-readable forms with Tableau and Power-Bi
  • Developed MapReduce Python modules for predictive analysis and ML in Hadoop on AWS
  • Performed many SQL queries in Teradata SQL workbench to prepare right data sets for Tableau dashboards, Queries involves in retrieving data from multiple tables using various join conditions that enabled to utilize the optimized data extractions for Tableau workbooks.
  • Worked on data cleaning and ensured data quality, consistency, integrity using Pandas NumPy.

Environment: R, AWS, NoSQL, AWS, MySQL, Juypter Notebooks, Python, Power-Bi, Business Intelligent, Wi-Fi routers and switches, MapReduce, Hadoop, Tableau, PowerBi, NLP, Teradata, Git, Agile/Scrum, Hive, Pig, Oracle, Tomcat Server.

Confidential

Data engineer

Responsibilities:

  • Synthesized current business intelligence data to produce reports and polished presentations, highlighting ndings and recommending changes.
  • Worked on the website which highly helps the students named XPLUR and developed all the requirements given by the client.
  • Have done data scrapping using the tool named Octoparse which helps to scrap the LinkedIn and seek data for loading into the database.
  • For the backend third-party application named Caspio is used, which helps to run the data at the back end.
  • Analyzed SAP transactions to build logical business intelligence model for real-time reporting needs.

Environment: Excel, XPLUR, Octoparse, Caspio, SAP.

Confidential

Data engineer

Responsibilities:

  • Used SSIS to create ETL packages to validate, Extract, Transform and load data into data warehouse and data mart.
  • Create views and Table-valued Functions, Joins, Complex subqueries to provide the reporting solutions.
  • Optimized the performance of queries with modification in T-SQL queries, removed unnecessary columns and reductant data, normalized tables, established joins and created indexes
  • Created SSIS packages using Pivot transformation, Fuzzy lookup, derived columns, condition split and data flow task.
  • Have written SQL queries and PL/SQL - procedures, functions, triggers, sequences, cursors etc.
  • Migrated data from SAS environment to SQL server 2008 via SQL integration Services (SSIS).
  • Built Rest API’s to easily add the new analysis or issuers to the model
  • Developed and dynamic performance and finance reports (profit loss statements, statements, funding reports, profitability gross margin etc.) by using SSRS and ran the reports monthly and distributed them to respective departments through mailing server subscriptions.
  • Used SAS/SQL to pull the data out from databases and aggregate to provide detailed reporting based on user requirements

Environment: SSIS, SQL, JAVA, PL/SQL, T-SQL, SAS, Rest API’s, SSRS, Agile/Scrum, SharePoint 2010, Visual studio 2010, DB2, SQL server management studio, Oracle

We'd love your feedback!