Data Engineer Resume
2.00/5 (Submit Your Rating)
Austin, TX
SUMMARY
- Around 7+ years of experience in Cloud Computing, Build/Release management, Cloud Resource Utilization,Infrastructure Automation, Unix/Linux Administration, Network Security, and in the areas of designing, developing, Continuous Integration, Continuous Delivery, Continuous Deployment and Continuous Monitoring of Enterprise level distributed applications.
- Experience in provisioning and administering EC2 Instances, Amazon S3 for storage, backup, archiving and recovery and accessing S3 via Amazon VPC and configuring Route 53 for managing DNS and routing internet traffic.
- Experienced in terraform by writing terraform modules for spinning up Amazon EC2 Instances, the orchestration of AWS S3, AWS ECS and creating EBS, stacks, IAM EC2 roles, and scaling puppet on AWS EC2 with terraform.
- Experience in Kubernetes for load balancing, Scaling, Helm charts, managing Docker containers, ClusterAdministration, creating pods and containers, running kubernetes on Azure and AWS EC2.
- Experience in creating Jenkins file in the root of a repository by using Groovy that contains a definition ofJenkins.
- Experience inAlteryxplatform, and involved indatapreparation,datablending, and the creation ofdatamodels,datasets usingAlteryx.
- Experience in real - time monitoring and alerting of applications deployed in AWS using Cloud Watch, Nagios, ELK stack and Splunk.
- Hands on experience onagileand Kanban methodologies.
- Worked with User Interfaces Applications using - TL1, CLI, WebUI, and JAVA based Client/server Application.
- Proficient with Shell, Python, Ruby, Perl, Power Shell, JSON, YAML, Groovy Scripting languages with the concepts of Cloud-based technologies.
- Experience in creating sessions, tasks and workflows usingAlteryx.
- Experience in integrating Unit Tests and Code Quality Analysis Tools like SonarQube, MS Test, JUnit, Selenium.
- Experience with SQL, PL/SQL, Database objects like Stored Procedures, Functions, Packages, Triggers and using the latest features to optimize performance like Bulk Binds, Inline views and Global Temporary Tables.
- Experienced in Agile Testing Methodologies &Software Test Life Cycle (STLC), developed the full Software.
PROFESSIONAL EXPERIENCE
Data Engineer
Confidential - Austin, TX
Responsibilities:
- Worked with a Functional Team andDataModelers/Architects to identify and understand thedatafrom different source systems.
- Participated in meetings with the Business analysts to gather and analyze the requirements.
- Developed PL/SQLscripts for loadingdatafrom OLTP schema to OLAP schema.
- Process JSON, XML, CSV and Text rawdataafter cleansing thedata.
- Database design and base model - Logical & Physical design with hands on experience in DDL andDML SQL operations.
- Used various database tools for performance tuning of Python usingSQLAccess Advisor,SQLPerformance Analyzer andSQLTuning Advisor.
- Used Workflow Repositories to investigate, standardize, match and survive fordataquality anddataprofiling issues during the designing.
- CreatedData StageWorkflows (ETL Process) using Power canter Designer and populating thedatainto theDataWarehouse constantly from different source systems like ODS, Flat files.
- Writes the SQL Queries & PL/SQL Code Tables andDataMart Staging Tables to validate thedataresults by counting the number of rows in both the tables.
- Worked withAlteryxApps/Macros to automate ETL workflows for daily/weekly reports.
- UNIX Shell Script to run jobs in multiple instances by using a parameter file.
- Extensively Written SQL/PL SQL Queries to extractdatafrom various source systems and populate in SAP reports.
- Extensive experience in working with AWS cloud Platform (EC2, S3, EMR, Redshift,Lambda and Glue).
- Working knowledge of Spark RDD,DataFrameAPI,Dataset API,DataSource API,Spark SQL, and Spark Streaming.
- Developed Spark Applications by using Python and Implemented Apache Sparkdataprocessing Project to handledatafrom various RDBMS and Streaming sources.
- Good understanding of Cassandra architecture, replication strategy, gossip, snitches etc.
- Designed columnar families in Cassandra and Ingesteddatafrom RDBMS, performeddatatransformations, and then exported the transformeddatato Cassandra as per the business requirement.
- Used the SparkDataCassandra Connector to loaddatato and from Cassandra.
- Worked from Scratch in Configurations of Kafka such as Mangers and Brokers.
- Experienced in creatingdata-models for Clients transactional logs, analyzed thedatafrom Cassandra.
- Tables for quick searching, sorting, and grouping using the Cassandra QueryLanguage.
- Tested the cluster performance using Cassandra-stress tool to measure and improve the Read/Writes.
- Stored in Hive to performdataanalysis to meet the business specification logic.
- Used Apache Kafka to aggregate web logdatafrom multiple servers and make them available in Downstream systems forDataanalysis and engineering type of roles.
- Worked in Implementing Kafka Security and Boosting its performance.
- DesignedUNIXScripts to automate the verification of the information inserted in the database. UsedPythonlibraries and SQL queries/subqueries to create several datasets which produced statistics, tables, charts and graphs.
- Developed Custom UDF in Python and used UDFs for sorting and preparing thedata.
- Worked on Custom Loaders and Storage Classes in PIG toworkon severaldataformats like JSON, XML, CSV, and generated Bags for processing using pig etc.
- Developed Sqoop and Kafka Jobs to loaddatafrom RDBMS, External Systems intoHDFS and HIVE.
- Executed scripts in QA environment and validated transformations of high volumedatausing SQL andUNIXcommands.
- Written several Map Reduce Jobs using Pyspark, Numpy and used for Continuous integration.
- Setting up and worked on Kerberos authentication principals to establish secure network communication.
- On cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
Environment: Spark, Spark-Streaming, Spark SQL, AWS EMR, UNIX, map R, HDFS, Hive, Pig,Apache Kafka, Sqoop, Python, Pyspark, Shell scripting, Linux, MySQL Oracle, Enterprise DB, SOLR, Jenkins, Eclipse, Oracle, Git, Oozie, Tableau, MySQL, Soap,Cassandra & Agile Methodologies.
Data analyst
Confidential - Boston, MA
Responsibilities:
- Working onAgilemethodology and closing the tasks with two to three weeks sprint model.
- Worked with the Spark for improving performance and optimization of the existing algorithms in Hadoop.
- Using Spark Context, Spark-SQL, Spark MLlib,DataFrame, Pair RDD and SparkYARN.
- Used Spark Streaming APIs to perform transformations and actions on the fly for building common.
- Developed Kafka consumer API in python for consumingdatafrom Kafka topics.
- Developed Tableau workbooks from multipledatasources.
- Developed Tableau visualizations and dashboards using Tableau Desktop.
- Consumed Extensible Markup Language (XML) messages using Kafka and processed theXML file using Spark Streaming to capture User Interface (UI) updates.
- Kept a track of existing production defects in legacy applications and document them in the enterpriseAgilecentral repository.
- Migrate theDatausing Azure databaseMigrationService (AMS).
- Development of variousAlteryxworkflows for automation of Cash Balances, Gross NotionalExposure and building reports for risk and operational metrics
- Developed Pre-processing job using SparkDataframes to flatten JSON documents to flat file.
- Load D-Streamdatainto Spark RDD and do in memorydataComputation to generate output response.
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 fordatasets processing and storage.
- Loadeddatainto S3 buckets using AWS Glue and PySpark. Involved in filteringdatastored in S3 buckets using Elasticsearch and loadeddatainto Hive external tables.
- Configured Snow pipe to pull thedatafrom S3 buckets into Snowflakes table.
- Stored incomingdatain the Snowflakes staging area.
- Created numerous ODI interfaces and load into Snowflake.
Environment: Hadoop, Spark, Scala, Hbase, Hive, UNIX, Erwin, TOAD, MS SQL Server database, XML files, AWS, Cassandra, MongoDB, Kafka, PL/SQL, Oracle 12c, Flat files,Autosys, MS Access database.
Data Analyst
Confidential
Responsibilities:
- Created SQL and Teradata BTEQs, mapping ETL and enabling transformation.
- Developed ETL process in thedatawarehouse, creating reliabledatapipelines.
- Used Alteryx designer to extractdatafrom multiple sources, integrated desperatedatainto a commondatamodel and integratedatainto a target database, application or files using efficient workflows.
- Worked on the performance tuning done on tables by creating indexes and partitions on the table.
- Working on differentdataload layers and supporting different database system.
- Leveraged bigdatatechnologies and RelationalDataBase Management Systems (RDBMS) to develop and maintain semantic layers ofdatain support of BI tools.
- Development of generic test cases to test the application functionality and the mappings are implemented with FDD logic.
- Designed and developed ETL workflows and datasets inAlteryx.
- Creating and scheduling the loads by using Control M and analyzing in case of failures.
- Participated in requirement review meetings with Business Analysts, Developers and Business Users to review the Main business processes, workflow, tasks, user roles, input, and output.
- Involved in Requirement gathering, Business Analysis, Development of code Testing and implementation of business requirements.
- Creating processes that enhance operational workflow and provide positive customer impact.
- Working with Unix/Linux systems with scripting experience and buildingdatapipelines.
- Built AWSDatapipelines fordatamigration from one DB to another DB.
- Built Platform features, user stories, and maintain backlog with priorities, using Rally and JIRA platforms.
- Grooming the user stories and defining the user acceptance criteria for Base Datamanagement project.
- Reviewed the SQL for missing joins, join constraints,dataformat issues and casting errors.
Environment: Python, Databricks, AWS, Hadoop, Snowflake, Tableau, Excel.