Sr. Data Engineer Resume
St Louis, MO
PROFESSIONAL SUMMARY
- Over 8+ years of experience in Data Engineering, Data Pipeline Design, Development and Implementation as a Sr. Data Engineer/Data Developer and Data Modeler.
- Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
- Excellent Working knowledge of Hadoop, Hive, Sqoop, pig, HBase & Oozie in real time environment and worked on many modules for performance improvements and architecture designing.
- Worked on developing ETL processes to load data from multiple data sources to HDFS using Sqoop, Pig & Oozie .
- Perform structural modifications using Map - Reduce, HIVE and analyze data using visualization/ reporting tools.
- Hands on experience in installing, configuring, monitoring, and using Hadoop ecosystem components like Hadoop Map-Reduce, HDFS, HBase, Hive, Sqoop, Pig, Zookeeper, Horton works, Flume
- My area of expertise has been on performing duties such as Analytics, Design, Data warehouse Modeling, Development, Implementation, Maintenance, Migration and Production support of large-scale Enterprise Data Warehouses.
- Well Exposure on Spark SQL, Spark Streaming and using Core Spark API to explore Spark features to build data pipelines.
- Extensively usedPythonLibraries PySpark, Pytest, Pymongo, cxOracle, PyExcel, Boto3, Psycopg, embedPy, NumPy and Beautiful Soup.
- Hands-on use of Spark andScalaAPI's to compare the performance of Spark with Hive and SQL, and Spark SQL to manipulate Data Frames inScala.
- Expertise in Python andScala, user-defined functions (UDF) for Hive and Pig using Python.
- Developed reports, dashboards using Tableau for quick reviews to be presented to Business and IT users.
- Extensive knowledge in various reporting objects like Facts, Attributes, Hierarchies, Transformations, filters, prompts, calculated fields, Sets, Groups, Parameters etc., in Tableau experience in working with Flume and NiFi for loading log files into Hadoop.
- Experienced in creating shell scripts to push data loads from various sources from the edge nodes onto the HDFS.
- Good Experience in implementing and orchestrating data pipelines using Oozie and Airflow.
- Worked with Cloudera and Hortonworks distributions.
- Expertise working with AWS cloud services like EMR, S3,Redshift, EMR cloud watch, for big data development.
- Experience on Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory
- Good working knowledge of Amazon Web Services(AWS) Cloud Platform which includes services likeEC2,S3,VPC,ELB, IAM, DynamoDB, Cloud Front, Cloud Watch, Route 53, Elastic Beanstalk (EBS), Auto Scaling, Security Groups, EC2 Container Service (ECS), Code Commit, Code Pipeline, Code Build, Code Deploy,DynamoDB, Auto Scaling, Security Groups, Red shift, CloudWatch, CloudFormation, CloudTrail, Ops Works, Kinesis, IAM, SQS, SNS, SES.
- Experience in Data Analysis, Data Profiling, Data Integration, Migration, Data governance and Metadata Management, Master Data Management and Configuration Management.
- Experience in developing customizedUDF’sin Python to extend Hive and Pig Latin functionality.
- Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB usingPython.
- Experiencein Google cloudecosystem like bigquery, bigtable, cloudproc, dialogflow,cloud storage and IAM policies.
- Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers. Experience includes Requirements Gathering, Design, Development, Integration, Documentation, Testing and Build
- Having extensive knowledge on RDBMS such as Oracle, MS SQLServer, MYSQL
- Experience in working with NoSQL databases like HBase and Cassandra.
- Well experience in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
- Good knowledge of Data Marts, OLAP, Dimensional Data Modeling with Ralph Kimball Methodology (Star Schema Modeling, Snow-Flake Modeling for FACT and Dimensions Tables) using Analysis Services.
- Ability to work effectively in cross-functional team environments, excellent communication, and interpersonal skills. Involved in converting Hive/SQL queries into Spark transformations using Spark Data frames and Scala.
- Developed custom Kafka producer and consumer for different publishing and subscribing to Kafka topics.
- Hands on experience in using other Amazon Web Services like Autoscaling, RedShift, DynamoDB, Route53.
- Experience with operating systems: Linux, RedHat, and UNIX.
- Worked on various programming languages using IDEs like Eclipse, NetBeans, and Intellij, Putty, GIT.
- Experienced in working in SDLC, Agile and Waterfall Methodologies.
TECHNICAL SKILLS
Big Data Technologies: Kafka, Cassandra, Apache Spark, Spark Streaming, HBase, Flume, Impala, HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper
Hadoop Distribution: Cloudera CDH, Apache, AWS, Horton Works HDP
Programming Languages: SQL, PL/SQL, Python, R, PYSpark, Pig, Hive QL, Scala, Shell Scripting, Regular Expressions
Spark components: RDD, Spark SQL (Data Frames and Dataset), and Spark Streaming
Cloud Infrastructure: AWS, Azure, GCP
Databases: Oracle, Teradata, My SQL, SQL Server, NoSQL Database (HBase, MongoDB)
Scripting & Query Languages: Shell scripting, SQL
Version Control: CVS, SVN and Clear Case, GIT
Build Tools: Maven, SBT
Containerization Tools: Kubernetes, Docker, Docker Swarm
Reporting Tools: Junit, Eclipse, Visual Studio, Net Beans, Azure Databricks, UNIX Eclipse, Visual Studio, Net Beans, Junit, CI/CD, Linux, Google Shell, Unix, Power BI, SAS and Tableau
PROFESSIONAL EXPERIENCE
Confidential, St. Louis, MO
Sr. Data Engineer
Responsibilities:
- Worked on AWS Data pipeline to configure data loads from S3 to into Redshift.
- Using AWS Redshift, I Extracted, transformed and loaded data from various heterogeneous data sources and destinations
- Used PySpark and Pandas to calculate the moving average and RSI score of the stocks and generated them into data warehouse.
- Performs data analysis and design, and creates and maintains large, complex logical and physical data models, and metadata repositories using ERWIN and MB MDR
- Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL,postgreSQL,Data Frame,OpenShift, Talend,pair RDD's
- Involved in integration of Hadoop cluster with spark engine to perform BATCH and GRAPHX operations.
- Performed data preprocessing and feature engineering for further predictive analytics using Python Pandas.
- Developed and validated machine learning models including Ridge and Lasso regression for predicting total amount of trade.
- Designed SSIS Packages to extract, transfer, load (ETL) existing data into SQL Server from different environments for the SSAS cubes (OLAP)
- Boosted the performance of regression models by applying polynomial transformation and feature selectionand used those methods to select stocks.
- Generated report on predictive analytics using Python and Tableau including visualizing model performance and prediction results.
- Exploring with Spark to improve the performance and optimization of the existing algorithms in Hadoop using Spark context, Spark-SQL,postgreSQL,Data Frame,OpenShift, Talend,pair RDD's
- Experienced in day-to-day DBA activities includingschema management, user management(creating users, synonyms, privileges, roles, quotas, tables, indexes, sequence),space management(table space, rollback segment),monitoring(alert log, memory, disk I/O, CPU, database connectivity),scheduling jobs, UNIX Shell Scripting.
- Expertise in usingDocker to run and deploy the applications in multiple containers likeDocker SwarmandDocker Wave.
- Developed complexTalend ETL jobsto migrate the data fromflat filesto database. Pulled files frommainframe into Talendexecution server using multipleftpcomponents.
- Implemented Copy activity, Custom Azure Data Factory Pipeline Activities
- Primarily involved in Data Migration using SQL, SQL Azure, Azure Storage, and Azure Data Factory, SSIS, PowerShell.
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks.
- Migration of on premise data (Oracle/ SQL Server/ DB2/ MongoDB) to Azure Data Lake and Stored (ADLS) using Azure Data Factory (ADF V1/V2).
- Worked on importing data from MySQL DB to HDFS and vice-versa using Sqoop to configure Hive meta store with MySQL, which stores the metadata for Hive tables.
- Design scoop scripts to load from Teradata and DB2 to Hadoop environment and also design Shell scripts to transfer data from Hadoop to Google Cloud Storage (GCS) and from GCS to Big Query.
- Architect and design serverless application CI/CD by using AWS Serverless (Lambda) application model.
- Wrote Kafka producers to stream the data from external rest API to Kafka topics.
- Exposure to Spark, Spark Streaming, Spark MLlib, snowflake, Scala and Creating the Data Frames handled in Sparkwith Scala.
- Good Exposure on Map Reduce programming using Java, PIG Latin Scripting and Distributed Application and HDFS.
- Experienced Good understanding of NoSQL databases and hands on work experience in writing applications No SQL Databases HBase, Cassandra and MongoDB
- Developed custom Kafka producer and consumer for different publishing and subscribing to Kafka topics.
- Migrated Map reduce jobs to Spark jobs to achieve better performance.
- Working on designing the MapReduce and Yarn flow and writing MapReduce scripts, performance tuning and debugging.
- Experienced with the Scala, Spark improving the performance and optimization of the existing algorithms in Hadoop using SparkContext, Spark -SQL, Pair RDD's, Spark YARN.f
- Developedstored procedures/views in Snowflakeand use inTalendfor loading Dimensions and Facts.
- Used Git for version control with colleagues.
Environment: Hdfs, Hive, Spark (PySpark, Spark SQL, Spark MLlib), Kafka, Linux, Python 3.x(Scikit-learn, NumPy, Pandas), Tableau 10.1, GitHub, Azure, AWS EMR/EC2/S3/Redshift, and Pig, Map Reduce, Cassandra, Data Lake, Sqoop, Oozie, MySQL, Oracle, Python, Shell Scripting, Git.
Confidential, Topeka, KS
Sr. Data Engineer / Big Data Engineer
Roles & Responsibilities:
- Involvement in working with Azure cloud stage (HDInsight, Data bricks, Data Lake, Blob, Data Factory, Synapse, SQL DB and SQL DWH).
- Used Azure Data Factory, SQL API and MongoDB API and integrated data from MongoDB, MS SQL, and cloud (Blob, Azure SQL DB).
- Using Linked Services/Datasets/Pipeline/ to extract, transform and load data from various sources such as Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool, and backwards, ADF pipelines were created.
- Designed end to end scalable architecture to solve business problems using various Azure Components like HDInsight, Data Factory, Data Lake, Storage and Machine Learning Studio.
- Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity.
- Performed information purging and applied changes utilizing Data bricks and Spark information analysis.
- Gathering data and business requirements from end users and management. Designed and built data solutions to migrate existing source data in Teradata and DB2 to Big Query (Google Cloud Platform).
- Performed data manipulation on extracted data using Python Pandas.
- Work with subject matter experts and project team to identify, define, collate, document and communicate the data migration requirements.
- Built customtableau/ SAP Business Objectsdashboards for the Salesforce for accepting the parameters from the Salesforce to show the relevant data for that selected object.
- Hands on Ab initio ETL, Data Mapping, Transformation and Loading in complex and high-volume environment
- Validate Scoop jobs, Shell scripts & perform data validation to check if data is loaded correctly without any discrepancy. Perform migration and testing of static data and transaction data from one core system to another.
- Develop best practice, processes, and standards for effectively carrying out data migration activities. Work across multiple functional projects to understand data usage and implications for data migration.
- Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
- Files extracted from Hadoop and dropped on daily hourly basis intoS3. Working with Data governance and Data quality to design various models and processes.
- Involved in all the steps and scope of the project reference data approach to MDM, have created a Data Dictionary and Mapping from Sources to the Target in MDM Data Model.
- Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services. Knowledge of USQL
- Prepare data migration plans including migration risk, milestones, quality and business sign-off details.
- Created functions and assigned roles inAWS Lambdato run python scripts, andAWS Lambdausing java to perform event driven processing. Created Lambda jobs and configured Roles usingAWS CLI
- Oversee the migration process from a business perspective. Coordinate between leads, process manager and project manager. Perform business validation of uploaded data.
- Worked on to retrieve the data from FS to S3 using spark commands
- Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS
- Created Metric tables, End user views in Snowflake to feed data for Tableau refresh.
- Generated Custom SQL to verify the dependency for the daily, Weekly, Monthly jobs.
- Using Nebula Metadata, registered Business and Technical Datasets for corresponding SQL scripts
- Experienced in working with spark ecosystem using Spark SQL and Scala queries on different formats like text file, CSV file.
- Loaded and transformed large sets of structured, semi structured and unstructured data using PIG by importing data using Sqoop to load and export data from My SQL to HDFS and NoSQL Databases on regular basis for designing and developing PIG scripts to process data in a batch to perform trend analysis of data.
- Experience in designing and developing POCs in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
- Developed spark code and spark-SQL/streaming for faster testing and processing of data.
- Closely involved in scheduling Daily, Monthly jobs with Precondition/Post condition based on the requirement.
- Monitor the Daily, Weekly, Monthly jobs and provide support in case of failures/issues.
Environment: Hadoop, Map Reduce, Azure, AWS Lambda, Azure, ADF, Snowflake, HDFS, Hive, My SQL, SQL Server, Tableau, Spark, SSIS., Scoop jobs, Shell scripts, Ab initio ETL
Confidential, Charlotte, NC
Data Engineer
Roles & Responsibilities:
- Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP
- Strong understanding of AWS components such as EC2 and S3
- Implemented a Continuous Delivery pipeline with Docker and Git Hub
- Worked with g-cloud function with Python to load Data in to Bigquery for on arrival csv files in GCS bucket
- Process and load bound and unbound Data from Google pub/sub topic to Bigquery using cloud Dataflow with Python.
- Devised simple and complex SQL scripts to check and validate Dataflow in various applications.
- Used Hive to implement data warehouse and stored data into HDFS. Stored data into Hadoop clusters which are set up in AWS EMR.
- Performed Data Preparation by using Pig Latin to get the right data format needed.
- Used Python pandas, Nifi, Jenkins, and textblobto finish the ETL process of clinical data for future NLP analysis.
- Utilized the clinical data to generate features to describe the different illnesses by using LDA Topic Modelling.
- Used PCA to reduce dimension and compute eigenvalue and eigenvector and used Open CV to analysis the CT scan pictures to figure out the disease in CT scan.
- Processed the image data through the Hadoop distributed system by using Map and Reduce then stored into HDFS.
- Created Session Beans and controller Servlets for handling HTTP requests from Talend
- Performed Data Visualization and Designed Dashboards with Tableau and generated complex reports including chars, summaries, and graphs to interpret the findings to the team and stakeholders.
- Wrote documentation for each report including purpose, data source, column mapping, transformation, and user group.
- Used Git for version control with Data Engineer team and Data Scientists colleagues.
- Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export through Python.
- Developed and deployed data pipeline in cloud such as AWS and GCP
- Performed data engineering functions: data extract, transformation, loading, and integration in support of enterprise data infrastructures - data warehouse, operational data stores and master data management
- Responsible for data services and data movement infrastructures
- Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Data bricks, NoSQL DB)
- good experience with ETL concepts, building ETL solutions and Data modeling
- Architected several DAGs (Directed Acyclic Graph) for automating ETL pipelines
- Hands on experience on architecting the ETL transformation layers and writing spark jobs to do the processing.
- Gather and process raw data at scale (including writing scripts, web scraping, calling APIs, write SQL queries, writing applications)
- Experience in fact dimensional modeling (Star schema, Snowflake schema), transactional modeling and SCD (Slowly changing dimension)
- Devised PL/SQL Stored Procedures, Functions, Triggers, Views and packages. Made use of Indexing, Aggregation and Materialized views to optimize query performance.
- Developed logistic regression models (Python) to predict subscription response rate based on customer’s variables like past transactions, response to prior mailings, promotions, demographics, interests, and hobbies, etc.
- Develop near real time data pipeline using spark
- Process and load bound and unbound Data from Google pub/sub topic to Big-query using cloud Data flow with Python
- Hands of experience inGCP, Big Query, GCS bucket, G - cloud function, cloud data flow, Pub/suB cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver
- Involved in writing SQL Queries and used Joins to access data from Oracle, and MySQL.
- Implemented Apache Airflow for authoring, scheduling and monitoring Data Pipelines
- Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical Modeling
- Worked on confluence and Jira
- skilled in data visualization like Matplotlib and seaborn library
- Hands on experience with big data tools like Hadoop, Spark, Hive
- Experience implementing machine learning back-end pipeline with Pandas, NumPy
Environment: Hadoop, Spark, Hive,Gcp, Bigquery, Gcs Bucket, G - cloud function, cloud data flow, AWS, Apache Airflow, Python, Pandas, Matplotlib, seaborn library, text mining,Jira, Numpy, PL/SQL,Scala, Spark.
Confidential
Data Engineer / Hadoop Spark Developer
Responsibilities:
- As a Big Data Developer, worked on Hadoop cluster scaling from 4 nodes in development environment to 8 nodes in pre-production stage and up to 24 nodes in production.
- Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
- Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive.
- Built pipelines to move hashed and un-hashed data from XML files to Data lake.
- Developed Spark scripts using Python on Azure HDInsight for Data Aggregation, Validation and verified its performance over MR jobs.
- Extensively worked with Spark-SQL context to create data frames and datasets to preprocess the model data.
- Experience with Cloud Service Providers such as Amazon AWS, Microsoft Azure, and Google GCP
- Data Analysis: Expertise in analyzing data using Pig scripting, Hive Queries, Sparks (python) and Impala.
- Experienced in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Involved in designing the row key in HBase to store Text and JSON as key values in HBase table and designed row key in such a way to get/scan it in a sorted order.
- Wrote Junit tests and Integration test cases for those Microservice.
- Worked in Azure environment for development and deployment of Custom Hadoop Applications.
- Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP.
- Developed NiFi workflow to pick up the multiple files from ftp location and move those to HDFS on daily basis.
- Scripting: Expertise in Hive, PIG, Impala, Shell Scripting, Perl Scripting, and Python.
- Developed business logic using Kafka Direct Stream in Spark Streaming and implemented business transformations.
- Proven experience with ETL frameworks (Airflow, Luigi, or our own open sourced garcon)
- Created Hive schemas using performance techniques like partitioning and bucketing.
- Used Hadoop YARN to perform analytics on data in Hive.
- Developed and maintained batch data flow using HiveQL and Unix scripting
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Build large-scale data processing systems in data warehousing solutions, and work with unstructured data mining on NoSQL.
- Specified the cluster size, allocating Resource pool, Distribution of Hadoop by writing the specification texts in JSON File format.
- Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
- Primarily involved in Data Migration process using Azure by integrating with GitHub repository and Jenkins.
- Demonstrable experience designing and implementing complex applications and distributed systems into public cloud infrastructure (AWS, GCP, Azure, etc…)
- Developed workflow in Oozie to manage and schedule jobs on Hadoop cluster to trigger daily, weekly and monthly batch cycles.
- Configured Hadoop tools like Hive, Pig, Zookeeper, Flume, Impala and Sqoop.
- Wrote Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
- Queried both Managed and External tables created by Hive using Impala.
- Recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database and SQL datawarehouse environment.
- Developed customized Hive UDFs and UDAFs in Java, JDBC connectivity with hive development and execution of Pig scripts and Pig UDF’s.
- Used windows Azure SQL reporting services to create reports with tables, charts and maps.
Environment: Hadoop 3.0,Hive, Azure, Micro services, Java 8, MapReduce, Agile, HBase 1.2, JSON, Spark 2.4, Kafka, JDBC, Hive 2.3,, Pig 0.17
Confidential
Data & Reporting Analyst
Roles & Responsibilities:
- Imported Legacy data from SQL Server and Teradata into Amazon S3.
- Created consumption views on top of metrics to reduce the running time for complex queries.
- Exported Data into Snowflake by creating Staging Tables to load Data of different files from Amazon S3.
- Implement automation, traceability, and transparency for every step of the process to build trust in data and streamline data science efforts using Python, Java, Hadoopstreaming, ApacheSpark, SparkSQL, Scala, Hive, and Pig.
- As a part of Data Migration, wrote many SQL Scripts for Mismatch of data and worked on loading the history data from Teradata SQL to snowflake.
- Developed SQL scripts to Upload, Retrieve, Manipulate and handle sensitive data (National Provider Identifier Data I.e. Name, Address, SSN,Phone No) in Teradata, SQL Server Management Studio and Snowflake Databases for the Project
- Built S3 buckets and managed policies for S3 buckets and used S3 bucket and Glacier for storage and backup on AWS
- Created performance dashboards in Tableau/ Excel / Power point for the key stakeholders
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala and have a good experience in using Spark-Shell and Spark Streaming.
- Worked with business to identify the gaps in mobile tracking and come up with the solution to solve.
- Analyzed click events of Hybrid landing page which includes bounce rate, conversion rate, Jump back rate, List/Gallery view, etc. and provide valuable information for landing page optimization.
- Evaluated the traffic and performance of Daily deals PLA ads and compare those items with non-daily deal items to see the possibility of increasing ROI. suggested improvements and modify existing BI components (Reports, Stored Procedures)
- Performed data validation and transformation using Python and Hadoop streaming.
- Prepared Test Plan to ensure QA and Development phases are in parallel
- Sqoop jobs, PIG and Hive scripts were created for data ingestion from relational databases to compare with historical data.
- Implemented Defect Tracking process using JIRAP tool by assigning bugs to Development Team
- Automated Regression tool (Qute) and reduced manual effort and increased team productivity
- Involved in Functional Testing, Integration testing, Regression Testing, Smoke testing and performance Testing. Tested Hadoop Map Reduce developed in python, pig, Hive
- Created Metric tables, End user views in Snowflake to feed data for Tableau refresh.
- Generated Custom SQL to verify the dependency for the daily, Weekly, Monthly jobs.
- Using Nebula Metadata, registered Business and Technical Datasets for corresponding SQL scripts
- Experienced in working with spark ecosystem using Spark SQL and Scala queries on different formats like text file, CSV file.
- Developed spark code and spark-SQL/streaming for faster testing and processing of data.
- Closely involved in scheduling Daily, Monthly jobs with Precondition/Postcondition based on the requirement.
- Monitor the Daily, Weekly, Monthly jobs and provide support in case of failures/issues.
Environment: Snowflake, AWS S3, GitHub,Teradata, SQL Server, Hadoop,Map Reduce,Python,Pig, Hive,Apache Spark, Sqoop
