We provide IT Staff Augmentation Services!

Senior Data Engineer/analyst/ Data Scientist Resume

5.00/5 (Submit Your Rating)

SUMMARY

  • Around 10 Years of experience in IT, this includes Implementation of Data Warehousing projects with Teradata. Strong understanding of Data warehouse project development life cycle. Expertise in Teradata/Netezza/Redshift Database design, implementation and maintenance mainly in Data Warehouse environments.
  • Experience in Hadoop based data environments from source system data like DWH to HDFS storage systems using Sqoop (for import/export), Hive (analytical purposes) etc.
  • Trained and well versed of Data Preprocessing &Visualization Techniques in Data Science (Supervised & Unsupervised Algorithms) using python libraries (scikit, pandas, seaborn, matplotlib etc.)
  • Experience in implementing data migration projects from on - prem Oracle, Teradata to Cloud Datalake and building ELT (for bulk & CDC loads) & ETL data pipelines in Snowflake environment (snowsql, snowpipe, streams etc.).
  • Expertise in handling services on AWS cloud like S3 for storage management, creating/configuring/integrating IAM roles, EC2 etc. knowledge on handling streaming data using AWS kinesis.
  • In Depth understanding and usage of Teradata OLAP functions. Proficient in Teradata SQL, Stored Procedures, Macros, Views, Indexes (Primary, Secondary, PPI, Join indexes etc.). 3+ years of experience in Teradata production support.
  • Around 5 + years of experience in working on Data integration & Business intelligence tools (Tableau, Business objects) and ETL tool as Informatica.
  • Experience on working with OLTP systems such as MS SQLServer & MySQL relational databases. Handling JSON files (Web API’s using POST method), XML & Flat files as source systems.
  • Proficiency in programming languages using Python and SQL.
  • Certified in Data Science and knowledgeable in handling algorithms, predictive modeling (Regression techniques like Simple/Multiple Linear Regression & Classification techniques like logical regression, support vector, Decision Tree, Random Forest) concepts.
  • Strong Data Modeling experience in ODS, Dimensional Data Modeling Methodologies likes Star Schema, Snowflake Schema. Design and development of OLAP models consisting of multi-dimensional cubes and drill through functionalities for data analysis.
  • Well versed in writing UNIX shell scripting.

TECHNICAL SKILLS

Languages: SQL, PL/SQL, Python, COBOL, JCL.

Operating Systems: Windows, Unix/Linux.

Database/DWH: Snowflake (SnowSQL & snowpipe), MySQL Database, DB 2 7.0/8.0/9.0 , Oracle 11g/10g/9i/8i/8, Teradata 13.10/14, SQL Server 2008, Netezza & RedShift.

File System: HDFS, JSON, XML, CSV, Flat Files.

ETL/BI Tools: Matillion ETL, Informatica PowerCenter 9.X, IDQ 10.1 Analyst & Developer, Business Objects, Tableau, OBIEE.

Other Tools: SFTP like FileZilla, Tidal, Autosys, Control-M, Sqoop, Hive, Pig, Erwin, Oracle SQL Developer, Teradata SQL Assistant, AWS S3.

PROFESSIONAL EXPERIENCE

Confidential

Senior Data Engineer/Analyst/ Data Scientist

Responsibilities:

  • Collaborate with all developers and business users to gather required data and execute all ETL programs and scripts on systems and implement all data warehouse activities and prepare reports for same.
  • Perform root cause analysis on all processes and resolve all production issues and validate all data and perform routine tests on databases and provide support to all ETL applications.
  • Develop and perform tests on all ETL codes for system data and analyze all data and design all data mapping techniques for all data models in systems.
  • Implement and perform Data mapping from source to target systems along with data profiling on relational database tables.
  • Responsible for data quality and data cleansing activities of daily workloads to ensure correctness & completeness of data patterns.
  • Responsible for trouble shooting, identifying and resolving data problems, worked with business owners to determine data requirements and identify data sources, provide estimates for task duration.
  • Analyzing the Business requirements and System specifications to understand the Application.
  • Developed ELT data pipelines using combination of Python and Snowflake.
  • Utilized Python Libraries like BOTO3, PANDAS, NUMPY (arthematic operations) for AWS.
  • Created AWS DMS Replication Instances to pick-up data from source endpoint and place it in target endpoint.
  • Created Snowflake roles, databases, warehouse & schemas and grants permission for the same.
  • Implemented storage integration to access S3 files and load data into S3.
  • Implemented CDC logic using snowpipe, stream and task.
  • Created snowflake external table which hold entire data including history data, also created a view on top of external table. This view displays only current state of data by eliminating history data.
  • Scheduled the jobs using AWS CloudWatch service to monitor the hourly, daily & weekly frequencies.
  • Created AWS SNS service for reporting error messages and error handling.
  • Loading the data into the warehouse from different flat files.
  • This Project deals with Migration of on-premises oracle data to Cloud Datalake (i.e. AWS S3). Also, it deals with implementation of ELT pipeline using snowflake and python.

Technologies Used: SQL Server, Oracle, Snowflake environment, Matillion (ETL), Python scripting, SnowSQL, Flat files, SQL, AWS, Erwin, JIRA.

Confidential

Senior Data Engineer/ Analyst/ Data Scientist

Responsibilities:

  • Analyzing the Business requirements and System specifications to understand the Application.
  • Importing data from source files like flat files using Teradata load utilities like FastLoad, Multiload, and TPump.
  • Creating adhoc reports by using Fast Export and BTEQ and has experience in data visualization by by placing it in a visual context.
  • Designed Informatica mappings to propagate data from various legacy source systems to Oracle. The interfaces were staged in Oracle before loading to the Data warehouse.
  • Performed Data transformations using various Informatica Transformations like Union, Joiner, Expression, Lookup, Aggregate, Filter, Router, Normalizer, Update Strategy, etc.
  • Responsible for Tuning Report Queries and ADHOC Queries.
  • Wrote transformations for data conversions into required form based on the client requirement using Teradata ETL processes.
  • Extracting files from Teradata tables and placed in Hadoop distributed file in system by java using Sqoop.
  • Experienced in Tuning SQL Statements and Procedures for enhancing the load performance in various schemas across databases. Tuning the queries to improve the performance of the report refresh time.
  • Created customized Web Intelligence reports from various sources of data.
  • Programmer analysts with expertise Tableau Servers in ETL, Teradata and other EDW data integrations and developments.
  • Involved in performance tuning on the source and target database and digital media data for querying and data loading.
  • Developed SQL scripts and shell scripts to move data from source systems to staging and from staging to Data warehouse in batch processing mode and understanding of Google cloud platforms by AWS to retrieve the data.
  • Involved in building and deploying cloud-based data pipelines and BI applications using AWS and GCP services.
  • Build Data Pipelines to ingest the structured data.
  • Exploring and performing POC on Google Cloud Platform (including Cloud Machine Learning, Cloud Data Store, Bigtable, Big Query, DataLab, and DataStudio).
  • Exported data from Teradata database using Teradata Fast Export.
  • Used UNIX scripts to run Teradata DDL in BTEQ and write to a log table.
  • Creating, loading and materializing views to extend the usability of data.
  • Automated Unix shell scripts to verify the count of records added everyday due to incremental data load for few of the base tables in order to check for the consistency
  • Making modifications as required for reporting process by understanding the existing data model and involved in retrieving data from relational databases.
  • Involved in working with SSA requestor responsibilities which will be assigned for both project and support requests.
  • Managing queries by creating, deleting, modifying, and viewing, enabling and disabling rules.
  • Loading the data into the warehouse from different flat files.

Technologies Used: Teradata V13.0, MYSQL, Hadoop, Java, Business Objects XIR3.1, Python, Mainframes, Oracle 10G, DB2, Teradata SQL Assistant, SQL Server, Flat files, SQL, AWS, Erwin, Linux, Shell Scripting

Confidential - Las Vegas, NV

Senior Data Analyst/ Engineer/ Data Scientist

Responsibilities:

  • Involved in designing the tableau ETL Servers process to Extract transform and load data from OLAP to Teradata data warehouse.
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Experienced in defining job flows and importing & exporting data into HDFS and Hive using Sqoop.
  • Experienced in managing and reviewing Hadoop log files and running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Was responsible to manage data coming from different sources and supported MapReduce Programs those are running on the cluster. Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and written Hive UDFs.
  • Generated different Space Reports in Teradata Manager to analyze different kind of issues
  • Provided ongoing support by developing processes and executing object migrations, security and access privilege setup and active performance monitoring.
  • Expertise in using Visual Explain, Index wizard, Statistics wizard to tune the bad queries and analyze the plans implement the recommendations to gain the performance.
  • Teradata performance tuning via Explain, PPI, AJI, Indices, collect statistics or rewriting of the code.
  • Developed BTEQ scripts to load data from Teradata Staging area to Teradata data mart
  • Developed scripts to load high volume data into empty tables using Fast Load utility.
  • Provided daily support, maintenance and operation support for all Hadoop Clusters offer custom programming, scripting and code creation for Hadoop MapReduce, Java programming and overall Java code run time support
  • Assistance in troubling shooting, installation, upgrades and overall operations of all Hadoop Clusters and big data environments.
  • Involved in working with SSA requestor responsibilities which will be assigned for both project and support requests.
  • Worked on different data stores and file formats in web services.
  • Used Fast Export utility to extract large volumes of data at high speed from Teradata warehouse.
  • Performance tuning for Teradata SQL statements using huge volume of data.
  • Created Fast Load, Fast Export, Multi Load, TPump, and BTEQ to load data from Oracle database and Flat files to primary data warehouse.
  • Created UNIX scripts for various purposes like FTP, Archive files and creating parameter files.
  • Scripts were run through UNIX shell programs in Batch scheduling.
  • Created procedures to delete duplicate records from warehouse tables.
  • Used Informatica debugging techniques to debug the data mappings, data visualizations and used session log files and bad files to trace errors occurred while loading.
  • Responsible for trouble shooting, identifying and resolving data problems, worked with analysts to determine data requirements and identify data sources, provide estimates for task duration.
  • Gather information from different data warehouse systems and loaded into warehouse using Fast Load, Fast Export, Xml import, Multi Load, BTEQ, Teradata parallel transporter (TPT) and UNIX shell scripts.
  • Generated the Business Objects reports involving complex queries, sub queries, Unions and Intersection.
  • Involved in unit testing, systems testing, integrated testing, Data validation and user acceptance testing.
  • Involved in 24x7 production support.

Technologies Used: Teradata V13.0, Cloudera-Hadoop, HDFS, MapReduce, Java, Sqoop, Hive, Teradata SQL Assistant, SQL Server, Flat files, SQL, Erwin, Windows RDP servers, Linux, Shell Scripting

Confidential - Atlanta, GA

Data Analyst/Hadoop Developer

Responsibilities:

  • Involved in Designing the ETL process to Extract transform and load data from OLAP to Teradata data warehouse.
  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in java for data cleaning and preprocessing.
  • Experienced in defining job flows and importing & exporting data into HDFS and Hive using Sqoop.
  • Experienced in managing and reviewing Hadoop log files and running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured and unstructured data.
  • Responsible to manage data coming from different sources and supported MapReduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and written Hive UDFs.
  • Involved in the performance tuning of ETL code review and analyze the target-based commit interval for optimum session performance.
  • Involved in developing the complex strategies of ETL jobs based on dependencies.
  • Developing Data Extraction, Transformation and Loading jobs from flat files, Oracle, SAP and Teradata Sources into Teradata using BTEQ, Fast Load, Multiload and stored procedure.
  • Design of process-oriented UNIX script and ETL processes for loading data into data warehouse.
  • Used Stored Procedures created Database Automation Script to create databases in different Environments.
  • Generated different Space Reports in Teradata Manager to analyze different kind of issues
  • Provided ongoing support by developing processes and executing object migrations, security and access privilege setup and active performance monitoring.
  • Expertise in using Visual Explain, Index wizard, Statistics wizard to tune the bad queries and analyze the plans implement the recommendations to gain the performance.
  • Wrote SQL queries and matching the data with database and reports.
  • Tuned and Enhanced Universes with SQL Queries for the Report Performance.
  • Created complicated reports including sub-reports, graphical reports, formula base and well-formatted reports according user requirements and by using data visualization we communicated data by encoding it as visual objects.
  • Involved in working with SSA requestor responsibilities which will be assigned for both project and support requests.
  • Developed several Informatica Mappings, Mapplets and Transformations to load data from relational and flat file sources into the data mart.
  • Created UNIX scripts for various purposes like FTP, Archive files and creating parameter files.
  • Scripts were run through UNIX shell programs in Batch scheduling.
  • Was responsible for trouble shooting, identifying and resolving data problems, worked with analysts to determine data requirements and identify data sources, provide estimates for task duration.
  • Gather information from different data warehouse systems and loaded into warehouse using Fast Load, Fast Export, Xml import, Multi Load, BTEQ, Teradata parallel transporter (TPT) and UNIX shell scripts.
  • Involved in unit testing, systems testing, integrated testing, Data validation and user acceptance testing.
  • Involved in 24x7 production support.

We'd love your feedback!