We provide IT Staff Augmentation Services!

Sr. Big Data/data Engineer Resume

2.00/5 (Submit Your Rating)

Jacksonville, FL

SUMMARY

  • Around 8 years of Professional experience in IT Industry, experience with specialization in Data Warehousing, Decision support Systems and extensive experience in implementing Full Life cycle Data Warehousing Projects and in Hadoop/Big Data related technology experience in Storage, Querying, Processing, analysis of data.
  • Software developmentinvolving cloud computing platforms likeAmazon Web Services (AWS), AzureandGoogle Cloud(GCP)
  • Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
  • Build a program with Python and apache beam and execute it in cloud Dataflow to run Data validation between raw source file and Bigquery tables.
  • Knowledge in install ng, configuring, and using Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Zookeeper and Flume.
  • Experience in analyzing data using HiveQL, HBase and custom Map Reduce programs.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems like Teradata, Oracle, SQL Server and vice - versa.
  • Design & implement migration strategies for traditional systems on Azure (Lift and shift/Azure Migrate, other third-party tools) worked on Azure suite: Azure SQL Database, Azure Data Lake (ADLS), Azure Data Factory (ADF) V2, Azure SQL Data Warehouse, Azure Service Bus, Azure key Vault, Azure Analysis Service (AAS), Azure Blob Storage, Azure Search, Azure App Service, AZURE data Platform Service
  • Hands on experience inGCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Data Proc, Stack driver
  • Developed complex mappings and load the data from various sources into the Data Warehouse, using different transformations/ Stages like Joiner, Transformer, Aggregator, Update Strategy, Rank, Lookup, Filter, Sorter, Source Qualifier, Stored Procedure transformation etc.
  • Implemented POC to migrate map reduce jobs into Spark transformations using Python.
  • Developed Apache Spark jobs using Python in a test environment for faster data processing and used Spark SQL for querying.
  • Experienced in Spark Core, Spark RDD, Pair RDD, Spark Deployment Architectures.
  • Experienced with performing real time analytics on NoSQL databases like HBase and Cassandra.
  • Worked on AWS EC2, EMR and S3 to create clusters and manage data using S3.
  • Experienced with Dimensional modelling, Data migration, Data cleansing, Data profiling, and ETL Processes features for data warehouses.
  • Strong understanding of the entire AWS Product and Service suite primarily EC2, S3, VPC, Lambda, Redshift, Spectrum, Athena, EMR(Hadoop) and other monitoring service of products and their applicable use cases, best practices and implementation, and support considerations.
  • Strong experience in Unix and Shell Scripting. Experience on Source control repositories like SVN, CVS and GITHUB.
  • Extensive experience in designing and implementation of continuous integration, continuous delivery, continuous deployment through Jenkins
  • Installed and configured apache airflow for workflow management and created workflows in python.
  • Expertise in Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.
  • Strong skills in Core JAVA, Swing, multithreading, and J2EE Technology with in-depth knowledge in JSF, JSP, Servlets, EJB, ORM, JDBC, Hibernate, Spring, Applets, JAI and XML.
  • Created Talend ETL jobs to receive attachment files from pop e-mail using tPop, tFileList and tFileInputMail and then loaded data from attachments into database and achieved the files.
  • Experience in working with GraphQL queries and use Apollo GraphQL library.
  • Developed Tableau data visualization using Cross tabs, Heat maps, Box and Whisker charts, Scatter Plots, Geographic Map, Pie Charts and Bar Charts and Density Chart.
  • Expertise in snowflake to create and Maintain Tables and views.
  • Worked on DataStage tools likeDataStage Designer, DataStageDirector and DataStage Administrator.
  • Good Experience on SDLC (Software Development Life cycle) and Agile methodology.
  • Actively participated in Scrum meetings, Sprint planning, Refinement and Retrospective ceremonies.

TECHNICAL SKILLS

Hadoop/Big Data: Hadoop (Yarn), HDFS, Map Reduce, Spark, Hive, Sqoop, Flume, Zookeeper, Oozie, Tez

Programming Languages: SQL, HQL, MS SQL, Python, Scala

Distributed computing: Amazon EMR (Elastic MapReduce), Horton Works (Ambari), Cloudera (HUE), PuTTY, snowflake

Relational Databases: Oracle 11g/10g/9i, MySQL, SQL Server 2005,2008

NoSQL Databases: HBase, MongoDB, Cassandra, PostgreSQL

Cloud Environments: AWS (EC2, EMR, S3, Kinesis, DynamoDB), Amazon Redshift, Azure, Docker container (Kubernetes), GCP (Google Cloud Platform), SQS, Lambda

Data File Types: JSON, CSV, PARQUET, AVRO, TEXTFILE

PROFESSIONAL EXPERIENCE

Sr. Big Data/Data Engineer

Confidential, Jacksonville, FL

Responsibilities:

  • Involved in migrating on-premise data(Oracle/SQL-Server/DB2/MongoDB) to Azure datalake store using Azure Data factory
  • Developed the functionalities using Agile Scrum Methodology.
  • Developing data processing tasks using PySpark such as reading data from external sources, merge data, perform data enrichment and load into target data destinations.
  • Perform Data Cleaning, features scaling, features engineering using pandas and NumPy packages in Python.
  • Build a program with Python and apache beam and execute it in cloud Dataflow to run Data validation between raw source file and Bigquery tables.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
  • Developed framework for converting existing PowerCenter mappings and to PySpark(Python and Spark) Jobs.
  • Excellent experience with Python development under Linux/Unix OS (Debian, Ubuntu, SUSE Linux, Red Hat Linux, Fedora) and Windows OS.
  • Worked collaboratively to manage build outs of large data clusters and real time streaming with Spark.
  • Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Designed, developed, tested, and maintained Tableau functional reports based on user requirements.
  • Delivered ETL tables by using BigQuery Apache Beamand Cloud Dataprep.
  • Good understanding of Spark Architecture with Databricks, Structured Streaming. Setting Up AWS and Microsoft Azure with Databricks, Databricks Workspace for Business Analytics, Manage Clusters In Databricks, Managing the Machine Learning Lifecycle.
  • Developed the map-reduce flows in Microsoft HDInsight hadoop environment using python.
  • Created Pyspark frame to bring data from DB2 to Amazon S3.
  • Currently setting up Spark environment to create data pipelines to load data into the Data Lake using Scala and Python scripting.
  • Converted existing BO reports to tableau dashboards
  • Used Airflow for orchestration and scheduling of the ingestion scripts.
  • Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
  • Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark
  • Responsible for loading Data pipelines from web servers using Kafka and Spark Streaming API.
  • Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL databases for huge volume of data.
  • Montering system prototypes by ingesting data, customing Apache Beam job to run and normalize data.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformation using Scala.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity
  • Optimize the Pyspark jobs to run on Kubernetes Cluster for faster data processing.
  • Responsible for estimating the cluster size, monitoring, and troubleshooting of the Spark databricks cluster.
  • DevelopedHive UDFsandPig UDFsusing Python in Microsoft HDInsight environment.
  • Repartitioned job flow by determining DataStage PX best available resource consumption.
  • Used Python in automation of Hive and Reading Configuration files.
  • Used Scala Test for writing test cases and coordinated with QA team on end to end testing.
  • Provided Production support to Tableau users and Wrote Custom SQL to support business requirements
  • Developed the batch scripts to fetch the data from AWS S3 storage and do required transformations in Scala using Spark framework.
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
  • Designed and developed Power BI graphical and visualization solutions with business requirement documents and plans for creating interactive dashboards.
  • Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS.
  • Generated ad-hoc reports in Excel Power Pivot and sheared them using Power BI to the decision makers for strategic planning.
  • Used JSON schema to define table and column mapping from S3 data to Redshift.
  • Responsible for building scalable distributed data solutions using EMR cluster environment with Amazon EMR.
  • Used AWS data pipeline for Data Extraction, Transformation and Loading from homogeneous or heterogeneous data sources and built various graphs for business decision-making using Python matplot library.
  • Created Talend jobs to copy the files from one server to another and utilized Talend FTP components.
  • Created and configured snowflake warehouse strategy to move a terabyte of data from S3 into Snowflake via PUT scripts. Loaded data from AWS S3 bucket to Snowflake database using snowpipe.
  • Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
  • Developed Kafka producer and consumers, HBase clients, Spark, and Hadoop MapReduce jobs along with components on HDFS, Hive.
  • AWS EC2/VPC/S3/SQS/SNSbased infrastructure automation throughTerraform, Ansible, Python, Bash Scripts.
  • WroteAnsible PlaybookswithPython SSHas theWrapperto Manage Configurations of AWS Nodes and Test.
  • Used ETL methodologies and best practices to create Talend ETL jobs. Followed and enhanced programming and naming standards.
  • Implemented large Lambda architecture using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, HDInsight, Azure SQL Server, AzureML and Power BI
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Data Integrity checks have been handled using Hive queries, Hadoop, and Spark.
  • Worked on performing transformations & actions on RDDs and Spark Streaming data with Scala.
  • Used Hive and created Hive Tables and was involved in data loading and writing Hive UDFs.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Hands on Experience on Amazon DynamoDB and Fivetran API connectors
  • Involved in NoSQL database design, integration, and implementation.
  • Loaded data into NoSQL database HBase.
  • Performed version control using GitHub to maintain code across environments.

Environment: Spark, Spark Streaming, Apache Kafka, Apache NiFi, Hive, Tez, AWS, ETL, PIG, UNIX, Linux, Tableau, Teradata, Airflow,AWS S3, EMR, Amazon Redshift, Scala, Pig, Sqoop, Hue, Oozie, Java, Scala, Python, GitHub, Azure Datalake

Big Data Analyst/ Big Data Engineer

Confidential, Chicago, IL

Responsibilities:

  • Processed data into HDFS by developing solutions, analyzed the data using Map Reduce, Pig, Hive and produced summary results from Hadoop.
  • Use Sqoop to import data from the Hadoop Distributed File System (HDFS) to RDBMS.
  • Involved in loading and Transforming sets of Structured, Semi Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
  • Experience on migrating SQL database to Azure data lake, Data Lake Analytics, Databricks and Azure SQL data warehouse.
  • Was responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark
  • Worked on file formats like Text files, Avro, parka Files. Extensively worked on Hive, Pig and Sqoop for sourcing and transformations.
  • Migrate data into Data Pipeline using Databricks, Spark SQL and Scala.
  • Worked on creating the RDD's, DF's for the required input data and performed the data transformations using Spark Python.
  • BuildingScala API for backend support forGraph DatabaseUser Interface.
  • Developed applications especially in LINUX environment and familiar with all of its commands.
  • Involved in requirement analysis, design, coding and implementation.
  • Continuous monitoring and managing the Hadoop Cluster using Cloudera Manager.
  • Involved in migrating existing traditional ETL jobs to Spark and Hive Jobs on new cloud data lake.
  • Controlling and granting database access and migrating on premise databases to Azure data lake store using Azure Data Factory.
  • Wrote and executed various MYSQL database queries from Python using Python-MySQL connector and MySQL dB package.
  • Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Create and maintain highly scalable andfault tolerantmulti-tier AWS and Azure environments spanning across multiple availability zones usingTerraformandCloudFormation.
  • Wrote complex spark applications for performing various denormalization of the datasets and creating a unified data analytics layer for downstream teams.
  • Involved in creating Hive scripts for performing ad hoc data analysis required by the business teams.
  • Used Git, GitHub, and Amazon EC2 and deployment using Heroku and Used extracted data for analysis and carried out various mathematical operations for calculation purpose using python library - NumPy, SciPy.
  • Used AWS Glue for the data transformation, validate and data cleansing
  • Worked on Informatica Power Center tools- Designer, Repository Manager, Workflow Manager, and Workflow Monitor.
  • Worked on Generating Dynamic Case Statements based on Excel provided by Business using spark.
  • Worked on Autosys for scheduling the Oozie Workflows.
  • Used SVN for branching, tagging, and merging.

Environment: CDH4, Cloudera Manager, MapReduce, HDFS, Linux, Hive, Pig, Scala, HBase, Flume, MySQL, Sqoop, Oozie, Python, Azure.

Big Data Analyst/ Big Data Engineer

Confidential, Chicago, IL

Responsibilities:

  • Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.
  • Installed and configured Hive, Pig, Sqoop, and Oozie on the Hadoop cluster.
  • Experienced in SQL programming and creation of relational database models
  • Installed Oozie Workflow engine to run multiple Hive and Pig Jobs.
  • DevelopedSparkscripts usingScalaas per the requirement usingSpark 1.5 framework
  • Used SQL Azure extensively for database needs in various applications.
  • Developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
  • Developed Simple to complex Map/Reduce Jobs using Hive. Involved in loading data from the UNIX file system to HDFS.
  • Query optimization, execution plan and Performance tuning of queries for better performance in SQL
  • Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
  • Creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and Spark.
  • Good experience in handling data manipulation using Python Scripts.
  • Developed business intelligence solutions using SQL server data tools 2015 & 2017 versions and load data to SQL & Azure Cloud databases
  • Analyzed large amounts of data sets to determine optimal ways to aggregate and report on it.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Written SQL Queries, Store Procedures, Triggers and functions for MySQL Databases
  • Migration of ETL processes from Oracle to Hive to test the easy data manipulation.
  • Performed optimization on Pig scripts and Hive queries increase efficiency and add new features to existing code.
  • Worked on creating tabular models onAzure analysis servicesfor meeting business reporting requirements
  • Used Hive and created Hive tables and was involved in data loading and writing Hive UDFs.
  • Used Sqoop to import data into HDFS and Hive from other data systems.
  • Installed Oozie workflow engine to run multiple Hive.
  • Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
  • Developed Hive queries to process the data for visualizing and reporting.

Environment: Apache Hadoop, Cloudera Manager, CDH2,Python, CentOS, Java, MapReduce, Pig, Hive, Sqoop, Oozie and SQL.

Associate Software Engineer

Confidential

Responsibilities:

  • Involved in Requirement gathering, writing design specification documents and identified the appropriate design pattern to develop the application.
  • Being a Fortune Global 500 company, worked closely with international clients to diagnose business processes or organizational problems, and leveraging technology to determine how our clients can seize new opportunities.
  • Worked on Oracle PL/SQL, a procedural programming language embedded in the database, along with SQL itself
  • Analyzed current processes and technologies, contributing through to the delivery/ integration of new solutions
  • Delivered solutions via an AGILE delivery methodology and leveraging DevOps to bring the solutions to life quickly for our international clients mainly from the Netherlands, Austria, Romania, Ireland.
  • Designed Use Case Diagrams, Class Diagrams and Sequence Diagrams and Objects.
  • Involved in designing user screens using HTML as per user requirements.
  • Used Spring-Hibernate integration in the back end to fetch data from Oracle and MYSQL databases.
  • Installed Web Logic Server for handling HTTP Request/Response.
  • Used Subversion for version control and created automated build scripts.

Environment: CSS, HTML, JavaScript, AJAX, JUNIT, Struts, Spring, Hibernate, Oracle, Eclipse IDE

We'd love your feedback!