Data Engineer Resume
Union City, NJ
SUMMARY
- Around 7 years of experience as Data Engineer and Data Analyst including designing, developing and implementation ofdatamodels for enterprise - level applications and systems.
- Strong experience in data warehouse complete life cycle using SQL Server suite of products SSIS, SSRS, and SSAS and MDX.
- Experience in loading and transforming data into HDFS from a large set of structured data from Oracle,SQL Server, and Teradata using Sqoop. experience in writing complex stored procedures, triggers, functions, views, queries in SQL and T-SQLusing SSMS and SSIS. experience in severalairflowDags to schedule datafactory and databrick jobs.
- Good Working Knowledge on working with AWS cloud services like EMR, S3,Redshift, EMR cloud watch, for bigdata development.
- Strong experience with bigdata processing using Hadoop technologies Map Reduce,ApacheSpark,ApacheHive and Pig.
- Good knowledge in streaming applications using Apache Kafka.
- Expertise in python scripting and Shell scripting. Acquired experience in Spark scripts in Python, Scala, andSQLfor advancement in development and examination through analysis.
- Experience in writing complex Spark programs usingPython, and integration with Hive and have an in-depth understanding of programming concepts.
- Good Experience in implementing and orchestratingdatapipelines using Oozie and Airflow.
- Experience on Migrating legacydatawarehouse and other databases toSnowflake.
- Good knowledge in AWS environment and AWS spark,Snowflake, Lamda, AWS RedShift, DMS, EMR, RDS, EC2, AWS stack with Strong experience in Cloud computing platforms such as AWS services.
- Experience on Dataretention requirements, created an AWS Glue job for archivingDatafromRedshifttables to S3 (online to cold storage).
- Experience in Google cloud ecosystem like bigquery, bigtable, cloudproc, dialogflow, cloud storage, lookerand IAM policies.
- Experience in developing visualization and KPI reports using Tableau andLooker
- Hands on experience with Google cloud services like GCP,BigQuery, GCS Bucket and G-Cloud Function.
- Exposure todatavisualization using multiple BI (Business Intelligence) Tools includingTableau, Power BI, Azure Databricks.
- Experience on Process and load bound and unboundDatafrom Google pub/subtopic toBigqueryusing cloud Dataflow with Python.
- Hands on experience on development and supporting projects.
- Extensively worked onDataExtraction, transforming and loading by using SSIS.
- Experience in the design, development and deployment of Analytical and Business Intelligence solutions using BigDatatechnologies Hive, Spark (Dataiku, PySpark), Alteryx and SSRS.
- Advanced experience with designing and developing ETL workflows & reports with Dataiku & Alteryx.
- Proficient in writing Bash, Pearl,Pythonscripts to automate and provide Control Flow
- Having experience to deploying the reports on different Servers (Report Server and Share Point integrated mode)
- Hands on experience on agile and Kanban methodologies.
- Having good experience on optimizing the complex stored Procedures and tuningQueries.
- Strong experience creating the reports and dashboards on Datavisualization tool using Power BI
- Strong the experience publishing the reports and giving the access to end users.
PROFESSIONAL EXPERIENCE
Data Engineer
Confidential, Union City, NJ
Responsibilities:
- Functioned as Data Engineer responsible for data modelling, data migration, design, preparing ETL pipelines for both cloud and on Exadata.
- Created the Linked services andDatasetsand pipeline.
- Created the automated work flows with help of triggers.
- Developed severalAirflowDags using AWS services and orchestrated it.
- Performed end-to-end Architecture & implementation assessment of various AWS services like Amazon EMR,Redshift, and S3.
- Developed predictive analytics usingApacheSparkScala APIs.
- Helped with the migration from the old server to Jira database (Matching Fields) withPythonscripts for transferring and verifying the information.
- DesignedPythonAPI to connect the AzureDataLake service to store and retrieve the mobile files.
- PerformedDatavalidations using Scala-Sparkscripts by capturing the RawDataon HDFS and comparing withdatain final tables.
- ImplementedLookerBI platform and integration services for internal ops team and clients reporting
- Extracted, Transformed, and Loaded thedatausing Amazon glue from S3 buckets for processing and storing thedatatoRedshift.
- Used GCP services like Cloud Storage, DataFlow, Cloud Function,DataProc, Cloud Run, Cloud Composer, Kubernetes,BigQuery, etc.,
- Multiple batch jobs were written for processing hourly and dailydatareceived through multiple sources like Adobe, No-
- Created complex PrestoSQLqueries to consumedataonto Tableau forDataVisualization
- Work related to downloadingBigQuerydatainto pandas or Sparkdataframes for advanced ETL capabilities.
- Use automated tools;Tableauand Power BI to extractdatafrom primary and secondary sources.
- ExportedDataintoSnowflakeby creating Staging Tables to loadDataof different files from Amazon S3.
- Created Metric tables, End user views inSnowflaketo feeddatafor Tableau refresh.
- Develop aDatapipeline usingAirflowand Python to ingest currentdataand historicaldatain thedatastaging area.
- Developed entire front-end and back-end modules usingPythonon Django and FlaskWeb Framework with GIT.
- Worked on developing a new DQ framework for the ETL workflow using Spark andPython.
- Worked on Generating Dynamic Case statements for hive scripts, based on Excel rules provided by Business usingPython.
- Executed Hadoop/Spark jobs onAWSEMR using programdatastored in S3 Buckets.
- Analysed thedataby performing Hive queries (Hive QL) to study customer behavior.
- ProducedAWSCloud Formation templates to create a custom infrastructure of our pipeline.
- DesignedAWSGlue pipelines to ingest, process, and storedatainteracting with different services inAWS
- Used AWS services S3, RDS andRedShiftfordataStorage.
- Monitoring the Daily Jobs and ADF Pipe Lines.
- Working with MapReduce programs usingApacheHadoop for working with BigData.
- Created the Power BI Reports and Dashboard as per the Client Requirements
- Published the Power BI reports and providing the access to end-users
- Implemented on SSAS Cube action changes for drill through reports
- Implemented the Dashboard, Apps and alerts in PowerBI cloud
- Provided production support for the customers by troubleshooting the issues.
- Interacted with the team members based in diverse global locations
Environment: Python, SQL, AWS, airflow, python, snowflake, looker,Redshift, bigquery, tableau, AWS Glue, AWS Athena, AWS S3, EC2, Oracle, MySQL, Mongo DB, AWS Lambda, Hadoop, Kafka, Beautiful Soup, Kubernetes, Docker, Jenkins, Maven, GIT, Jira, Agile, Visual Studio, Windows, Apache, spark.
Data Engineer
Confidential, Phoenix, AZ
Responsibilities:
- Involved in all the phases of the software development life cycle: evaluating the business requirements, conceptualizing ideas, and transforming them into software functionality.
- Developed Batch and streaming pipelines using GCP Services, Docker, Kubernetes, and Cloud composer (Airflow).
- Performed impact analysis and designed plans to accommodate the changes requested by the clients.
- Identifying root cause of the problem, if any job fails and inform to IMOPS team.
- Responsible for creating Databases, Tables, Cluster/Non-Cluster Index,Unique/Check Constraints, Views, Stored Procedures, and Triggers.
- Worked on fixing the problem tickets.
- Developed AWS strategy, planning, and configuration of S3, Security groups, IAM, EC2, EMR andRedshift
- Built products and delivered analysis using SQL (Snowflake, AmazonRedshift) and visualization tools (Periscope, Tableau).
- TunedSQLPools containing integrated userdataby managing statistics.
- Involved in developing DAGS usingAirfloworchestration tool and monitored the weekly processes.
- Processed Location and Segmentsdatafrom S3 toSnowflakeby using Tasks, Streams, Pipes, and stored procedures.
- Handled importing ofdatafrom variousdatasources, performeddatacontrol checks usingSparkand loadeddatainto HDFS.
- Worked on both analytics and operation reporting onLookerplatform
- Built Dashboards onLookerand have a good understanding of the concepts of alooker.
- Built GCPdataand analytics services such asBigQuery,BigQuerymodel training/serving, and managed spark and Hadoop services using DataProc in GCP.
- Using rest API with Python to ingestDatafrom and some other site toBIGQUERY.
- Develop and maintain front-end analytics tools and dashboards withTableau, providing productive insights into operational performance and other key business performance metrics.
- Automate different workflows, which are initiated manually withPythonscripts and UNIX shell scripting.
- Fixing thedataissues on the right time Used Python Boto3 for developing Lambda functions inAWS.
- Worked on new or Implementing changes on the existing SSIS packages as per the requirement
- Worked on the SSAS cube script and deploy the SSAS cubes.
- Involved in client Meetings to discuss the requirements
- Worked on the Power BI Reports and Dashboard as per the Client Requirements
- Worked with the documentation team and release management team.
- Preparing the High- & Low-Level Design Documents for the Solution.
- Implementing/AmendingDataWarehouse (SSIS) and Cubes (SSAS) as per Business
- Used various transformation tasks to create ETL packages fordataconversion
- Actively involved in developing Complex SSRS Reports involving Sub Reports,Matrix/Tabular Reports, Charts and Graph based on MDX Queries.
Environment: SQL Server, Hadoop, Oracle, airflow, SSIS, SSRS, snowflake, python, looker, SharePoint, MDX, SSAS, tableau, redshift, bigquery Apache, spark.
Software Developer
Confidential, York, PA
Responsibilities:
- Worked in designing control anddataflow architecture.Designing the table structure in SQL and creating the stored proc.
- Providing demo to the client regarding the functionality of the Interface.
- Preparing the technical specification of the Interface.
- Promoting the Interface to Integration, Acceptance and then into Production.
- Created the Spark Job to process thedataand loaded it in AmazonRedshift.
- Developed ETL pipelines in and out ofdatawarehouse using a combination of Python and Snowflakes SnowSQL Writing SQL queries againstSnowflake.
- DevelopedAirflowdatafordataingestion activities from AWS s3 to AWS redshift tables.
- Manage datasets using Pandadataframes and MySQL, queried MYSQL database queries frompythonusingPython-MySQL connector and MySQL dB package to retrieve information.
- Working with MapReduce programs usingApacheHadoop for working with BigData.
- Created report on Cloud based environment using AmazonRedshiftand published onTableau.
- Collecting and aggregating large amounts of logdatausing Flume and stagingdatainHDFSfor further analysis
- Rewrote scripts in executeSQLtasks to Lookup activities and complex ones to stored procedures activities with linked services to AzureSQLDatabases.
- Helped Dev ops Engineers for deploying code and debug issues.
- DevelopedETLdatapipelining function for unstructured text, categoricaldata(Natural language processing)
Environment: ER/ Studio, SQL, airflow, Python, Hadoop, APIs, OLAP, OLTP, PL/SQL, Oracle, Teradata,BI, Tableau, ETL, SSIS, SSAS, SSRS, T-SQL, tableau, snowflake, looker, Apache, spark,Redshift, bigquery.
Software Engineer
Confidential, Nashville, TN
Responsibilities:
- As per the Client requirements developed the reports in the form of Matrix,Table, Drill Through, Sub Reports.
- Created Packages on SSIS by using differentdataTransformations like Derived Column, Lookup, Conditional Split, Merge Join, and Aggregation.
- RunPythonmap reduce code to validate thedatain HDFS.
- Worked on Unit testing the packages and deploying the Packages.
- Worked onairflowas a scheduling and orchestration tool.
- Created the Procedures, Functions and View as Per SSIS and Report Requirements
- Deploying the Reports to Report Manager.
- UsedLookerreports for bringingdatafrom client.
Environment: SQL Server, airflow, Apache, spark, python, Integration Services, snowflake, looker, Sql server Reporting service, tableau.