We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

0/5 (Submit Your Rating)

Los Angeles, CA

SUMMARY

  • Around 7+ years of professional IT experience with cloud technologies, a background in data warehousing and business intelligence, and expertise in designing, developing, analyzing, implementing, and supporting DWBI applications after installation.
  • Good experience in developing web applications implementing Model View Control (MVC) architecture using Django framework.
  • Good knowledge on extracting the models and trends from the rawdatacollaborating with thedatascience team and working on cloud platforms like AWS and Azure.
  • Strong Experience in web - based UI design and interface development using HTML5, CSS3, Bootstrap, JavaScript and JQuery.
  • Hands on experience in Hadoop Ecosystem components such as Hadoop, Spark, HDFS, YARN, TEZ, Hive, Swoop, Map Reduce, Pig, OOZIE, Kafka, Storm, HBASE.
  • Have experience in Apache Spark, SparkStreaming, Spark SQL, and NoSQL databases like HBase,Cassandra, and MongoDB.
  • Experience in creating complexdatapipeline process using T-SQL scripts,SSISpackages, Apteryx workflow, PL/SQL scripts, Cloud REST APIs, Python scripts, GCP Composer, GCP dataflow.
  • Experience using cloud providers such asAWSusingserviceslikeEC2, S3, VPC,Databases, ECS.
  • PerformedETLusing Spark toTransformthedataand performDatawrangling before feeding it to model.
  • Expertise in using Spark T-SQL, U-SQL with variousdatasources like JSON, Parquet and Hive.
  • Experience withSnowflakeMulti-Cluster Warehouses,SnowflakeVirtual Warehouses,building Snow pipe.
  • Extensive experience inTextAnalytics, generatingdatavisualizations using Python, and creating dashboards using tools like Power BI.
  • Expertise inJavaand Spring Framework (AOP, MVC Modules) in developing if dynamic web applications.
  • Good Experience in Software Development Life Cycle SDLC and Software Testing Life Cycle STLC onAgile Scrum, Waterfall, V-Model and Agile Environments.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, MapReduce, HDFS, Sqoop, PIG, Hive, HBase, Kafka, Zookeeper, Apache Spark.

Databases: Oracle, MySQL, SQL Server, MongoDB, Cassandra, DynamoDB, PostgreSQL.

Tools: PyCharm, Visual Studio, SQL*Plus, SQL Developer, TOAD, SQL Navigator, SQL Server Management Studio.

Programming: Python, PySpark, Scala, Java, C, C++, Shell script, SQL.

Cloud Technologies: AWS, Microsoft Azure.

Frameworks: Django REST framework, MVC, Hortonworks.

Versioning tools: SVN, Git, GitHub.

Tool: Apache Airflow.

PROFESSIONAL EXPERIENCE

Sr. Data Engineer

Confidential, Los Angeles, CA

Responsibilities:

  • Responsible for building Confidentialdatacube using SPARK framework by writing Spark SQL queries in Python so as to improve efficiency ofdataprocessing and reporting query response time.
  • DeployedAWSCLI scripts fordatareplication between on-premises servers andAWSS3, Lambda, automateddatareplication toAWSvia Windows Job Scheduler, prepared technical documentation forAWSservices.
  • Implemented Kafka/Spark streaming pipelines for Data ingestion using Stream Set Data Collector.
  • Develop database objects, Design, and Implement SSIS (ETL) packages, utilizingSSIS, Stored Procedures, Views, and MS SQL.
  • Implemented Apache Hadoop frame works like Hive, Hue, Pig and Ganglia to Process and Analyses datasets from variousdatasources and applications.
  • Developed Kafka consumer's API in python for consuming data from Kafka topics.
  • Designing and building ETL/ELT process from the scratch using SSIS and stored dataproc.
  • Designed, developed and tested frontend of a website using HTML5, CSS3, JavaScript and React.
  • Create process, modules and jobs in Apache Airflow and Talend scheduler to execute the scripts using Google Cloud Platform (GCP) Services like GCP composers, Compute Engine, Cloud Load Balancing, Cloud Storage, Cloud SQL.
  • Worked on various Relational Databases like Teradata, Postgres, MySQL, Oracle 10g, DB2.
  • Created ETL/ELT jobs & sequencers in IBM Data Stage Designer to load thedatafrom flat files intodatawarehouse.
  • Used Kafka for live streaming data and performed analytics on it. Worked on Sqoop to transfer the data from relational database and Hadoop.
  • Created Python / SQL scripts, to transformData bricksnotebooks from Redshift table into Snowflake S3 buckets.
  • Involved in CI/CD process using GIT, Nexus, Jenkins job creation, Maven build Create Docker image and deployment in AWS environment.
  • Developed J2EE based web application by implementing Spring MVC framework.
  • Implemented Apache Spark code to read multiple tables from the real-time records and filter the data based on the requirement.
  • Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators both old and newer operators.
  • Designed and implemented effective Analytics solutions and models with Snowflake.
  • CreatedTableauVisualizations by connecting to AWS Hadoop Elastic Map Reduce.
  • Developed ETL pipelines in and out of data warehouses using a combination of Databricks /Python/ETL tools and Snowflake SQL.
  • Process and load bound and unboundDatafrom Googlepub/subtopic to Big Query using cloud Dataflow with Python and R.
  • Designed and implemented distributed systems with Apache Spark and Python/Scala.
  • Used Spark API over Cloudera/Hadoop/YARN to perform analytics ondatain Hive and MongoDB.

Environment: AWS - EC2, S3, Cloud Formation, Kafka, Apache Spark, Apache Hadoop, Oracle, Linux/Windows, Java, Scala, Python and R, MySQL BigQuery.

Data Engineer

Confidential, Atlanta, GA

Responsibilities:

  • Responsible for analysing large datasets and derived customer usage patterns by developing Map Reduce programs using Python and R.
  • Involved in modelling datasets from verity ofdatasources like Hadoop (using Pig, Hive, Spark) Teradata and Snowflakes for ad-hoc analysis and have fair understanding ofAGILEmethodology and practice.
  • Implemented Kafka/Spark streaming pipelines forDataingestion using Stream SetDataCollector.
  • Used Apache Airflow for scheduling and monitoring workflows across AWS services.
  • Worked on Spark using Python creatingDataframes using Spark SQL Context for faster processing ofdataand for computing analytics.
  • Utilized Kubernetes and Docker for the runtime environment for the Continuous Integration/Continuous Deployment system to build, test, and deploy.
  • Created various pipelines for data transformation for machine learning models using PySpark.
  • Developed, deployed, and supported client-facing BigDataprocesses in a production environment,using languages like Scala, Java, and Python alongside otherCI/CDtools.
  • Worked on downloadingBigQuerydatainto Pandasdataframes for sophisticatedETL.
  • Worked on GCP's Compute Engine, Cloud SQL, Data Store, Big Query, Pub/Sub, and Data Proc and built an SSH tunnel to Google Data Proc to access yarn management and track spark tasks.
  • Imported real time and batchdatafrom various sources into S3 and used AWS lambda for processing applications in Snowflake.
  • Developed Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats.
  • Designed and developed the UI of the website using Python, HTML, XHTML, AJAX, CSS and JavaScript.
  • Designed materialized views, scheduled jobs, ETL workflows and reporting which enabled dataflow across 8 ERP systems.
  • Worked on designing and implementing complex applications and distributed systems into public cloud infrastructure (AWS,GCP, etc.)
  • Involveddatastreaming in Kafka & deployed Scrum inDatabricks using Apache Spark SQL batch and streaming APIs to work with Hive methodologies fordatamanagement and analytics in Jira.
  • Worked on EMR clusters of AWS for processing Big Data across a Hadoop Cluster of virtual servers.
  • Worked on Apache Spark using Python creatingDataframes using Spark SQL Context for faster processing ofdataand for computing analytics.
  • Used Apache Airflow for scheduling and monitoring workflows across AWS services.
  • Worked on developing Kafka Producers and Kafka Consumers for streaming millions of events per second on streaming data.

Environment: Kafka, PySpark, Scala, Kubernetes, Docker, Terraform, PL/SQL, Snowflake, Hadoop,Map Reduce, Python, PySpark Executor, Swoop, Hive, Linux, Apache Airflow.

Data Engineer

Confidential, Shreveport, LA

Responsibilities:

  • Involved in SDLC Requirements gathering, Analysis, Design, Development andTesting of application using Agile Methodology.
  • Responsible for the execution of bigdataanalytics, predictive analytics, and machine learning initiatives.
  • Implemented a proof of concept deploying this product in AWS S3 bucket andSnowflake.
  • Utilize AWS services with focus on bigdataarchitect, analytics and enterpriseDatawarehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
  • Developed Pyspark scripts, UDF's using bothdataframes/SQL and RDD in Spark fordataaggregation, queries and writing back into S3 bucket.
  • Wrote, compiled, and executed programs as necessary using Apache Spark in Pyspark to perform ETL jobs with ingesteddata.
  • Used Spark Streaming to divide streamingdatainto batches as an input to Spark engine for batch processing.
  • Wrote Spark applications fordatavalidation, cleansing, transformation, and custom aggregation and used Spark engine, Spark SQL fordataanalysis and provided to thedatascientists for further analysis.
  • Prepared scripts to automate the ingestion process using Python as needed through various sources such as API, AWS S3, Teradata and snowflake.
  • Developed Spark workflows using Pyspark fordatapull from AWS S3 bucket and Snowflake applying transformations on it.
  • Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.
  • Created scripts to read CSV, json and parquet files from S3 buckets in Python and load into AWS S3, Dynamo DB and Snowflake.
  • Implemented AWS Lambda functions to run scripts in response to events in AmazonDynamo DB table or S3 bucket or to HTTP requests using Amazon API gateway.
  • Migrateddatafrom AWS S3 bucket to Snowflake by writing custom read/write snowflake utility function using Pyspark.
  • Worked on Snowflake Schemas andDataWarehousing and processed batch and streamingdataload pipeline using Snow Pipe and Matillion fromdatalakeConfidential AWS S3 bucket.
  • Involved and configure Apache Airflow for S3 bucket and Snowflakedatawarehouse and created dags to run the Airflow.

Environment: Agile Scrum, MapReduce, Snowflake, Pig, Spark, Pyspark, Hive, Kafka,Python, Airflow, JSON, Parquet, CSV, Code cloud, AWS.

Data Analyst

Confidential, Chicago, IL

Responsibilities:

  • Responsible for migrating the workflows from development to production environment.
  • PerformedDataAnalysis andDataProfiling and worked ondatatransformations anddataquality rules.
  • Used Kubernetes to orchestrate the deployment, scaling and management of Docker Containers.
  • Involved in Graph engine with Kafka Streams and K Table components
  • PerformedDataCleaning, features scaling, features engineering using pandas and NumPy packages in python and build models using deep learning frameworks.
  • Using Flume, Kafka and Spark streaming to ingest real time or near real time data to HDFS.
  • Worked on analysing Hadoop cluster and different Big Data analytic tools including Pig, hive, HBase, Spark and Sqoop.
  • Utilized various utilities like Struts Tag Libraries, JSP, JavaScript, HTML, & CSS.

Environment: MS SQL, Hadoop, HDFS, Pig, Hive, Map Reduce, Python libraries (Numpy, Pandas, Ski-kit learn, SciPy, Matplotlib), PL/SQL, MDM, SQL Server, DB2, Git.

We'd love your feedback!