Sr. Data Engineer Resume
Overland Park, KansaS
SUMMARY
- 7+ Years of experience as aDataEngineer with Analysis, Design,Development, Testing, Customization, Bug fixes, Enhancement, Support, andImplementation of various web, stand - alone, client-server enterprise applications using MySQL workbench, Python, and Django in various domains.
- Strong hands on experience inHadoopFramework and its ecosystem including HDFSArchitecture, MapReduce Programming, Hive, Pig, Sqoop, Hbase, Zookeeper, Couchbase, Storm, Solr, Oozie,Spark, Pyspark, Scala, Flume, Strom and Kafka.
- Experienced with full software development life cycle (SDLC), architecting scalable platforms, and object-oriented programming (OOPs) database design.
- Experience in working on different Databases/Datawarehouses like Teradata,Oracle, AWS Redshift, and Snowflake.
- Experienced in managing and reviewing Hadoop log files and running Hadoop streaming jobs to process terabytes of XML formatdata.
- Good experience on BI reporting toolIBMCognos(Framework Manager, Report studio,CognosWorkspace, Analysis studio, Query studio).
- Experience in building and architecting multipleDatapipelines, including ETL andELT process forDataingestion and transformation inGCPusing Big Query,Dataproc, Cloud SQL, Datastore.
- Experience in Google Cloud (GCP) components, Google container builders andGCPclient libraries and cloud SDK’s.
- Experience in buildingdatapipelines fromKafkaqueues to ELK stack 5.1.2 including setting up monitoring using Elastic X-Pack.
- Extensive experience asETLDeveloper in technologies and methods using Informatica Power Center 9.6.1,9.5, 8.x, 7.x.
- Well Experience with Big data on AWS cloud services i.e., EC2, S3, Glue, Anthena, DynamoDB and RedShift.
- Experience in job/workflow scheduling and monitoring tools like Oozie, AWS Data pipeline & Autosys.
- Experience with Unix/Linux systems with scripting experience and building data pipelines.
- Hands-on work experience in writing applications on NoSQL databases like HBase,Cassandra, and MongoDB.
- Experience with Web Development, Amazon Web Services, Python, and the Django framework.
- Good experience in developing web applications implementing MVT/MVC architecture using Django.
- Experience in working on various applications using python integrated IDEsSublime Text, PyCharm, NetBeans, Pydev, and Spyder.
- Experience analyzing very large, complex, multi-dimensionaldatasets and developing analytic solutions, experience in predictive analytics using Python.
- Experienced working on Tableau,Grafanato produce different views ofdatavisualizations and presenting dashboards on web and desktop platforms to End-users and help them make effective business.
- Experience withSnowflakeMulti-Cluster Warehouses.
- Experience withSnowflakeVirtual Warehouses. Experience in building Snowpipe.
- Experience usingSparkand Amazon Machine Learning (AML) to build ML models.Experience with platforms (Azure, and AWS).
- Experience in shell scripting to runETLmappings and file handling.
- Experienced in working withHadoop/Big-Datastorage and analytical frameworks over Azure cloud.
- Strong knowledge and experience in Confidential Web Services (AWS) Cloud services like EC2, and S3.
- Experience in writing Sub Queries, Stored Procedures, Triggers, Cursors, andFunctions on MySQL and PostgreSQL databases.
- Experienced in implementing CTE, anchor Recursive CTE, and temp tables, and effective DDL/DML Trigger to facilitate efficientdatamanipulation anddataconsistency as well as to support the existing applications.
- Hands-on experience in handling database issues and connections with SQL and NoSQL databases like MongoDB, Cassandra, Redis, CouchDB, DynamoDB by installing and configuring various packages in python.
TECHNICAL SKILLS
Sr. Data Engineer
Confidential -Overland Park, Kansas
Responsibilities:
- Develop an in-time peak/valley alert monitoring system by Teradata, SQL, and python. Design the whole logic of the statistician model, matrix setting, threshold algorithm design, and auto-trigger email sending system.
- Built a series of functions to clean, repopulate, and pivot the original database in Teradata by SQL queries.
- Design the interface and content in HTML for the sample alert email, including a table containing key information, a dashboard chart designed by a python in attachment, and a link to the terminal dashboard in an internal platform.
- Developed and tunedHadoopMapReduce programs to analyze thedata, populate stage tables and store the refineddatainto partitioned tables in the EDW.
- Developed internal API's using Node.js and used MongoDB for fetching the schema. Worked on Node.js for developing server-side web applications. Implemented Python views and templates with Python and Django's view controller and Jinja templating language to create a user-friendly website interface.
- Developed several Informatica Mappings, Mapplets, and Transformations to loaddatafrom relational and flat file sources into thedatamart.
- Handled end-to-end development from developing APIs in Django and frontend in React to the deployment of various features implemented in client’s system in accounting tools
- Working with anAgileenvironments including the Scrum methodology within the cross-functional team and act as a liaison between the business user group and the technical team.
- Worked with DevOps team to setup serverless based pipelines for Lamdba code deployment.
- Working onDatabricksnotebooks for interactive analytics using Spark APIs.
- Loadeddatainto S3 buckets using AWS Glue andPySpark. Involved in filteringdatastored in S3 buckets usingElasticsearch and loadeddatainto Hive external tables.
- UtilizingKafkaConnect framework for connecting to Oracle database to push and pulldatafrom/toKafka.
- Implemented AWS Lambda functions to run scripts in response to events in AmazonDynamoDBtable or S3 bucket or to HTTP requests using Amazon API gateway.
- Testing imported custom profiledatainElasticsearchIndexes (Lucene Queries) against sourcedatain google BigQuery &datain new Iterable API nodes.
- Setup ConfluentKafkaClusters v5.4.1 ( 15 nodes) on bareMetal servers using Ansible.
- Worked on complexETLdesign extractingdatafrom 40+ resources successfully.
- Working with CI/CD tools such as Jenkins and version control tools Git, Bitbucket.
- Create Continuousdataingestion pipelines usingSnowpipe, Snowflake Streams andSnowflake Tasks.
- Integrate AWS S3 buckets with Snowflake using Snowpipe.
- Build real-time reporting infrastructure usingIBMCognosand advanced excel reporting application to provide real-time insights into fund movement, market impact, and business KPIs.
- Designed Automation process for daily/weeklyETLroutine.And CreatedETLdesign documents and flow diagrams.
- Worked with Spark,PySpark, Python Pandas, and NumPy packages on thedatacleaning process anddatamanipulation.
- Implemented a Python-based k-means clustering viaPySparkto analyze the spending habit of different customer groups.
- Working onGrafanaDashboards setup for Postgres and Cassandra databases.
- Extensively usedHive/HQL orHivequeriesto querydatainHiveTables and loadeddatainto HBase tables.
- Worked onETLtestingand used SSIS tester automated tool for unit and integrationtesting.
- Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.
- Worked successfully in anAgileProject across multiple features and component teams.
- Worked extensively with Dimensional modeling,Datamigration,Datacleansing,ETLProcesses fordatawarehouses.
- Extensively worked on to develop test plans and test strategies forETL, Report, and SharePoint and Web application projects.
- Developed complete end to end Big-dataprocessing in Hadoop eco system.
- Built compliance anddataquality checks pipeline using airflow, SQL, Teradata, and cloud functions.
- DevelopedSparkapplications usingSparkandSpark-SQL fordataextraction, transformation, and aggregation from multiple file formats for analyzing & transforming thedatato uncover insights into the customer usage patterns.
- DevelopedSparkApplications by using Scala and Implemented ApacheSparkdataprocessing project to handledatafrom various RDBMS and Streaming sources.
- Designing and buildingDevOpsto provide CICD using Team city, GIT, Maven to reduce build & deployment times, increase automation and efficiently analyze code& security vulnerabilities prior to production.
- Developed and deployeddatapipeline in cloud such as AWS andGCPand also StoringDataFiles in Google Cloud S3 Buckets daily basis. UsingDataProc, Big Query to develop and maintainGCPcloud base solution.
- BuiltDatapipelines using Apache Beam framework inGCPfor ETL related jobs for different airflow operators.
- Write complex SQL scripts to analyzedatapresent in different Databases/Datawarehouses like Snowflake, Teradata, and Redshift.
- Build, create and configure enterprise levelSnowflakeenvironments. Maintain, implement and monitorSnowflakeEnvironments.
- Migrate scripts and programs to AWS Cloud environment.
- Automate the process to send thedataquality alerts to slack channel and email using Databricks, Python and HTML. This will alert users if there are any issues withdata.
- Performdatacomparison between SDP (StreamingDataPlatform) real-timedatawithAWS S3dataand Snowflakedatausing Databricks, Spark SQL, and Python.
ENVIRONMENT: Hive, ETL, Prometheus, pyspark, Talend, Jenkins, Airflow, Gerrit, Kafka, Spark, DynamoDB, Sqoop, Maven, Automic, SQL, Scala, Junit, Intellij, MySQL, Databricks, Snowflake, AWS cloud, Glue, Python 3.8.
Data Engineer
Confidential - Tampa, FL
Responsibilities:
- PerformDataAnalysis on the analyticaldatapresent in AWS S3, AWS Redshift,Snowflake, and Teradata using SQL, Python, Spark, and Databricks.
- Design, develop, implement and execute marketing campaigns for US card customers using Unica Affinium campaign, Snowflake, AWS S3, Pyspark, and Databricks.
- Create scripts and programs to gain an understanding ofdatasets, discoverdataquality anddataintegrity issues associated with the analyticaldataand perform root cause analysis for those issues.
- Worked on segmentation analytics for each campaign using database technologies present both on-premise (such as SQL, Teradata, UNIX) and on the Cloud platform using AWS technologies and BigDatatechnologies such as Spark, Python, andDatabricks.
- Create custom reports and dashboards using business intelligence software likeTableau and Quick Sight to presentdataanalysis and conclusions.
- Developeddatastrategies,datamodels, and databases in anagiledevelopment process and ceremonies including scrum, planning events, backlog grooming, retrospectives, and demos.
- Develop Spark applications usingpysparkand spark SQL fordataextraction, transformation, and aggregation from multiple file formats for analyzing and transform.
- Python/Django based web application, PostgreSQL DB, and integrations with 3rd party email, messaging, storage services.
- Involved in migration of datasets and ETL workloads with Python from On-prem toAWS Cloud services.
- Created monitors, alarms, notifications, and logs for Lambda functions, GlueJobs, EC2 hosts using CloudWatch, and used AWS Glue for thedatatransformation, validation, anddatacleansing.
- Developed Python code to gather thedatafrom HBase and designs the solution to implement using PySpark.
- Written Hive queries fordataanalysis to meet the business requirements anddesigned and developed a User Defined Function (UDF) for Hive.
- Created automatic Jenkins builds when pushing code on GIT to a newly created branch.
- Involved in continuous integration and deployment (CI/CD) using DevOps tools like Looper, Concord.
- Implementeddataingestion strategies and scalable pipelines,datawarehouse, anddatamart structures in the Snowflakedataplatform.
- Worked onDataWarehousing,HadoopHDFS, and pipelines.
- Responsible for runningHadoopstreaming jobs to process terabytes of csvdata.
- Proficiency in writing complex SQL queries, and PL/SQL to write StoredProcedures, Functions, and Triggers.
- Designed and implementeddatapipelines that handle a lot ofdatastreaming.
- Worked on database designs that includedatamodels, metadata, ETL specifications, and process flows for businessdataproject integrations.
- Worked on multi-clustered environment to set up Cloudera HortonworksHadoopecosystem.
- Developed code fordataingestion and curation using Informatica IICS, Spark, and Kafka.
- Created and maintained technical documentation for launching the Hadoop cluster and for executing Hive queries.
- Involved in converting Hive/SQL queries into Spark transformations using SparkRDDs and python.
- Used and configured multiple AWS services like RedShift, EMR, EC2, and S3 to maintain compliance with organization standards.
Environment: Python 3.6, PySpark, ETL, AWS, Glue, Talend, Lambda, EC2, CloudWatch, MySQL, SQL,NoSQL, PL/SQL, Teradata, Snowflake, Hive, Agile, and Windows.
Data Engineer
Confidential -Natick, MA
Responsibilities:
- Currently serve as a clouddataengineer, managing their AWS environments and cloud foundry open-source PaaS.
- Designed and develop moderate to complexdatapipeline using AWS Glue.
- Worked on repeatable patterns for Migration ofdatato AWSDataLake from on promises and other cloud providers.
- Work with AWS tools like Athena, Quick Sight, and Sage maker to provide platforms for Cyber Fraud and analytics team to perform necessary analytics and machine learning.
- Creating pipelines,dataflows and complexdatatransformations and manipulations using ADF andPySparkwith Databricks.
- DevelopedSparkcode andSpark-SQL/Streaming for faster testing and processing ofdata.
- Performeddatatesting, testedETLmappings (Transformation logic), tested stored procedures, and tested the XML messages.
- Worked on the technical analysis on existing code in Unix/DataStage for migration to AWS.
- Use Standard CI/CD pipeline to roll and deploy code.
- Create automated solutions using Databricks, Spark, Python, Snowflake, and HTML.
- Used backups of s3 buckets and was-rds databases and perform restoration as needed.
- Architect and design static and dynamic websites using content management systems and build packs.
- Worked on Amazon EC2, S3, IAM, VPC, RDS, SQS, Route53, and CloudFront and other services automation and orchestration services (Elastic Beanstalk,CloudFormation).
- Worked with AWS Cloud platform and Configured AWS EC2 Cloud Instances using AMIs and launched instances concerning specific applications.
- IntegratedHadoopwith Active Directory and enabled Kerberos for authentication.
- Created highly available and scalable infrastructure in AWS cloud by using various AWS services like EC2, VPC, Autoscaling, ELB, RDS, and Route53.
- Developed Merge jobs in Python to extract and loaddatainto MySQL database, also worked on PythonETLfile loading and use of regular expression.
- DevelopedHivequeriesto process thedataand generate thedatacubes for visualizing.
- Performed both major and minor upgrades to the existing ClouderaHadoopcluster.
- Deployed Docker engines in virtualized platforms for containerization of multiple apps.
- Responsible for Continuous Integration (CI) and Continuous Delivery (CD) process implementation using Jenkins along with Shell scripts to automate routine jobs.
Environment: Python 3.0, Django, MVC, HTML5, XHTML, CSS3, JavaScript, Angular.JS, Pyspark, AWS,Bootstrap, AJAX, jQuery, JSON, REST, MySQL, SQL, Agile and Windows.
Data Analyst
Confidential - NYC, NY
Responsibilities:
- Involved in Designing the ETL process to Extract transform and loaddatafromOLAP to the Teradatadatawarehouse.
- Involved in loadingdatafrom the UNIX file system to HDFS.
- Involved in the performance tuning of ETL code review and analyzing the target-based commit interval for optimum session performance.
- Involved in internal Web Search tool for Bankers using Python Flask micro framework.
- Involved in maintaining 2DataWarehouses inBankingdomain.
- Involved in converting Hive/SQL queries intoSparktransformations usingSparkRDD, Scala, and Python.
- Used Stored Procedures to create Database Automation scripts to create databases in different environments.
- Was responsible for troubleshooting, identifying, and resolvingdataproblems, worked with analysts to determinedatarequirements and identifydatasources, and provide estimates for task duration.
- Involved in unit testing, systems testing, integrated testing,Datavalidation, and user acceptance testing.
Environment: Hadoop MapReduce, HDFS, Teradata, SSIS, SSRS, Oracle, MS SQLProfiler, JavaScript, XML, Erwin,MS Office, Tableau Desktop, Tableau Server.
