Senior Data Engineer Resume
Chicago, IL
SUMMARY
- Certified professional data engineer with overall IT experience around 8 years in different domain clients and technology spectrums.
- Experienced in working in highly scalable and large - scale applications building with different technologies using Cloud, bigdata, DevOps and Spring boot.
- Also, expert in working in different working environments like Agile and Waterfall.
- Experience in working multi cloud, migration, and scalable application projects.
- Built spark data pipelines with various optimization techniques using python and Scala.
- Experience in working various Hadoop distributions like Cloudera, Hortonworks and MapR.
- Experienced in data transfer between HDFS and RDBMS using tools like Sqoop, Talend and Spark.
- Expert in ingesting data for incremental loads from various RDBMS tools using Apache Sqoop.
- Developed scalable applications for real-time ingestions into various databases using Apache Kafka.
- Developed Pig Latin scripts and MapReduce jobs for large data transformations and Loads.
- Experience in design, develop and maintain Datalake projects using different bigdata tool stack.
- Experience in building Scala applications for loading data into NoSQL databases (MongoDB).
- Implemented various optimizing techniques in Hive and Spark scripts for data transformations.
- Expert in writing various scripts using Shell Script and Python Scripting.
- Experience in working with NoSQL Databases like HBase, Cassandra and MongoDB.
- Migrated data from different data sources like Oracle, MySQL, Teradata to Hive, HBase and HDFS.
- Experienced in building Jupyter notebooks using PySpark for extensive data analysis.
- Experience in working with various cloud distributions like AWS, Azure and GCP.
- Experience in ingesting and exporting data from Apache Kafka using Apache Spark Streaming.
- Implemented streaming applications to consume data from Event Hub and Pub/Sub.
- Experience in using different optimized file formats like Avro, Parquet, Sequence.
- Experience is using Azure cloud tools like Azure data factory, Azure Data Lake, Azure Synapsis.
- Developed scalable applications using AWS tools like Redshift, DynamoDB.
- Worked on building pipelines using snowflake for extensive data aggregations.
- Experience on GCP tools like BigQuery, Pub/Sub, Cloud SQL and Cloud functions.
- Built custom dashboards using Power BI for reporting purpose.
- Experience in building continuous integration and deployments using Jenkins, Drone, Travis CI.
- Expert in building containerized apps using tools like Docker, Kubernetes and terraform.
- Experience in building metrics dashboards and alerts using Grafana and Kibana.
- Worked on containerization technologies like Docker and Kubernetes for scaling applications.
- Experience in various integration tools like Talend, NiFi for ingesting batch and streaming data.
- Experience in migration of data warehouse applications into snowflake.
- Experience in working Agile environments and waterfall models.
TECHNICAL SKILLS
Bigdata Ecosystem: HDFS, Map Reduce, YARN, Hive, HBase, Impala, Sqoop, Oozie, Tez, Spark
Cloud Environment: AWS, Azure and GCP
NoSQL: HBase, Cassandra, Mongo DB
Databases: Oracle 11g/10g, Teradata, DB2, MS-SQL Server, MySQL, MS-Access
Programming Languages: Scala, Python, SQL, PL/SQL, Linux shell scripts
BI Tools: Tableau, Power BI, Apache Superset
Alerting & Logging: Grafana, Kibana
Automation: Airflow, NiFi, Oozie
PROFESSIONAL EXPERIENCE
Confidential, Chicago, IL
Senior Data Engineer
Responsibilities:
- Responsible for sessions with business, project manager, Business Analyst, and other key people to understand the business needs and propose a solution from a Warehouse standpoint.
- Designed the ER diagrams, logical model (relationship, cardinality, attributes, and candidate keys) and physical database (capacity planning, object creation and aggregation strategies) for Oracle and Teradata as per business requirements using ER Studio.
- Importing the Data using Sqoop from various source systems like Mainframes, Oracle, MySQL, DB2 etc., to Data Lake Raw Zone.
- Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in Amazon EMR and worked with cloud-based technology like Redshift, S3, AWS, EC2 Machine, etc. and extracting the data from the Oracle financials and the Redshift database.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data through Hadoop, Map Reduce Frameworks, HBase, and Hive.
- Worked on predictive and what-if analysis using Python from HDFS and successfully loaded files to HDFS from Teradata and loaded from HDFS to HIVE.
- Used AWS Lambda to perform data validation, filtering, sorting, or other transformations for every data change in a DynamoDB table and load the transformed data to another data store with heavy user experience.
- Worked on Amazon Redshift and AWS kinesis data, create data models and extracted Meta Data from Amazon Redshift, AWS, and Elastic Search engine using SQL queries to create reports.
- Migrate existing architecture to Amazon Web Services and utilize several technologies like Kinesis, RedShift, AWS Lambda, Cloud watch metrics and Query in Amazon Athena with the alerts coming from S3 buckets and finding out the alerts generation difference from the Kafka cluster and Kinesis cluster.
- Experienced in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension).
- Involved in Migrating Objects from Teradata to Snowflake, scheduled different Snowflake jobs using NiFi and Used NiFi to ping snowflake to keep Client Session alive
- Used Spark/PySpark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries, E-commerce.
- Developed Stored Procedures, Functions & Packages to implement the logic at the server end on Oracle and Performed Application/ SQL Tuning using Explain Plan, SQL Tracing. Also, I used Materialized View for the Reporting Requirement.
- Worked on importing and exporting data from snowflake, Oracle and DB2 into HDFS and HIVE using Sqoop for analysis, visualization and to generate reports.
- Worked on implementing microservices on Kubernetes Cluster and Configured Operators on Kubernetes applications and all its components, such as Deployments, Config Maps, Secrets and Services.
Environment: Python, Power BI, AWS Glue, Athena, SSRS, SSIS, AWS S3, AWS Redshift, AWS EMR, AWS RDS, DynamoDB, SQL, Tableau, Distributed Computing, Snowflake, Spark, Kafka, MongoDB, Hadoop, Linux Command Line, Data structures, PySpark, Oozie, HDFS, MapReduce, Cloudera, HBase, Hive, Pig, Docker, Tableau.
Confidential, Charlotte, North Carolina
Sr. Data Engineer
Responsibilities:
- Experience in developing scalable real-time applications for ingesting clickstream data using Kafka Streams and Spark Streaming.
- Worked on Talend integrations to ingest data from multiple sources into Data Lake. Used Azure Data Lake storage gen2 to store excel files, parquet files and retrieve user data using Blob API.
- Worked on Azure data bricks, PySpark, Spark SQL, Azure ADW, and Hive used to load and transform data.
- Used Azure Data Lake as Source and pulled data using Azure Polybase.
- Azure data lake, Azure Blob used for storage and performed analytics in Azure Synapse Analytics.
- 1+ years of experience inAzure Cloud, Azure Data Factory, Azure Data Lake Storage, Azure Synapse Analytics, Azure Analytical services, Azure Cosmos NO SQL DB, Azure HDInsightBig Data Technologies (Hadoop and Apache Spark) andData bricks.
- Experience in designingAzure Cloud Architectureand Implementation plans for hosting complex application workloads on MS Azure.
- Ingested data from RDBMS and performed data transformations, and then export the transformed data to Cassandra as per the business requirement.
- Managed the data lakes data movements involving Hadoop, NO-SQL databases like HBase, Cassandra.
- Written multiple MapReduce programs for data extraction, transformation, and aggregation from numerous file formats, including XML, JSON, CSV & other compressed file formats.
- We have developed automated processes for flattening the upstream data from Cassandra, which in JSON format. Used Hive UDFs to flatten the JSON Data.
- I have worked on Data loading into Hive for Data Ingestion history and Data content summary.
- Created Impala tables and SFTP scripts, and Shell scripts to import data into Hadoop.
- Created Hive tables and involved in data loading and writing Hive UDFs. Developed Hive UDFs for rating aggregation.
- Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
- Did various performance optimizations like using distributed cache for small datasets, partition and bucketing in Hive, doing map side joins etc.
- Used JIRA for bug tracking and CVS for version control hadoop.
- Experience designing solutions in Azure tools like Azure Data Factory, Azure Data Lake, SQL DWH, Azure SQL & Azure SQL Data Warehouse, Azure Functions.
- Worked on migrating data from HDFS to Azure HD Insights and Azure Databricks.
- Migrated existing processes and data from our on-premises SQL Server and other environments to Azure Data Lake.
- Implemented multiple modules in microservices to expose data through Restful API’s.
- Developed Jenkins pipelines for continuous integration and deployment purpose.
- Experience in working on analyzing snowflake datasets performance.
Environment: PySpark, Kafka, Spark, Sqoop, Hive, Azure, Databricks, Grafana, Jenkins, Azure Data Lake, Azure SQL, Jenkins, Grafana, Python, Shell, Microservices, Restful API’s
Confidential, Miami, Florida
Data Engineer
Responsibilities:
- Developed and Tuned Spark Streaming application using Scala for processing data from Kafka.
- Imported batch data using Sqoop to load data to HDFS on regular basis from various sources.
- Experience in working on cloud environments such as Open Shift and AWS. Worked on AWSDatapipeline to configuredataloads from S3 to Redshift.
- Builtdataworkflows by usingGCP, HBase,Big, Table,BigQuery, AWS EMR, Spark, Spark SQL, Scala, and Python.
- Created Tables, Stored Procedures, and extracteddatausing T-SQL for business users whenever required.
- Imported trading and derivatives data in Hadoop Distributed File System using Eco System components MapReduce, Pig, Hive, Sqoop.
- Responsible writing Hive queries and PIG scripts for data processing.
- Using AWS Redshift, I Extracted, transformed, and loaded data from various heterogeneous data sources and destinations.
- Glue job triggered using AWS Lambda.
- Running Sqoop for importing data from Oracle and another Database.
- Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP and coordinate task among the team.
- Extensively used AWS Lambda, Kinesis, Cloud Front for real time data collection.
- Optimized script using illustrate and explain and used parameterize Pig Script.
- Creating Athena glue tables on existing csv data using AWS crawlers.
- Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators Imported and exported data into HDFS and Hive using Sqoop.
- Involved in the process of configuring HA, Kerberos security issues and name node failure restoration activity time to time as a part of zero downtime.
- Experience working on AWS EMR, S3 and EC2 instances.
- Used SVN as version control to check in the code, created branches and tagged the code in SVN.
Environment: HDP, HDFS, Hive, Spark, Oozie, HBase, AWS, Scala, Python, Bash, Kafka, Java, Jenkins, Spark Streaming, Tez, AWS Athena, Glue
Confidential
Software Engineer
Responsibilities:
- Collaborate with cross-functional teams (business stakeholders, developers, product managers, product owners, and management) to identify requirements, data, and key insights needed for developing models to derive business value.
- Worked on analyzing Hadoop cluster and different Big Data analytic tools including Pig, Hive, HBase, and Sqoop.
- Define and extract data from multiple sources, integrate disparate data into a common data model, and integrate data into a target database, application, or file using efficient programming processes
- Worked on predictive and what-if analysis using Python from HDFS and successfully loaded files to HDFS from Teradata, and loaded from HDFS to HIVE and worked on NoSQL databases including HBase, Mongo DB, and Cassandra.
- Developed MapReduce applications using Hadoop MapReduce programming framework for processing and used compression techniques to optimize MapReduce Jobs.
- Developed workflow using Oozie for running MapReduce jobs and Hive Queries.
- Worked on distributed computing architectures such as Hadoop, Kubernetes, and Docker containers.
- Extensively used Pig for data cleansing and extracting the data from the web server output files to load into HDFS.
- Developed a data pipeline using Kafka and Storm to store data into HDFS.
- Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift and have used AWS components (Amazon Web Services) - Downloading and uploading data files (withETL) to AWS system using S3 components and Used AWS Data Pipeline to schedule an Amazon EMR cluster to clean and process web server logs stored in Amazon S3 bucket.
- Develop and implement scripts for database and data process maintenance, monitoring, and performance tuning.
- Imported data from relational data sources to HDFS and imported bulk data into HBase using Map Reduce programs.
- Design and develop scalable, efficient data pipeline processes to handle data ingestion, cleansing, transformation, and integration using Sqoop, Hive, Python, and Impala.
- Developed Pig UDFs to know customer behavior and Pig Latin scripts for processing the data in Hadoop.
- Developed simple to complex MapReduce streaming jobs using Python.
- Scheduled automated tasks with Oozie for loading data into HDFS through Sqoop and pre-processing the data with Pig and Hive.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks, and analysis.
- Develop the Oozie actions like hive, shell, and java to submit and schedule applications to run in the Hadoop cluster.
- Migrated ETL jobs to Pig Scripts and worked on different file formats like sequence files, XML, JSON, etc.
- Created Reference Table using Informatica Analyst tool as well as Informatica Developer tool.
- Worked with building data warehouse structures, and creating facts, dimensions, and aggregate tables, by dimensional modeling Star and Snowflake schemas.
- Worked on Oozie workflow engine for job scheduling,
- Implement a Continuous Delivery Pipeline with Docker and Git Hub.
- Involved in creating/modifying worksheets and data visualization dashboards in Tableau.
Environment: Agile, Scrum, Hadoop, MapReduce, HDFS, HBase, AWS, Redshift, S3, Kubernetes, Docker, UNIX, Hive, Sqoop, Oozie, BigData ECO systems, PIG, Cloudera, Python, Impala, Teradata, MongoDB, Cassandra, Informatica, Unix scripts, XML files, JSON, Rest API, Maven, GitHub, Tableau.