We provide IT Staff Augmentation Services!

Data Engineer Resume

4.00/5 (Submit Your Rating)

Dallas, TX

SUMMARY

  • 7+ years of experience in implementing various Data Science, Analytics, Big Data/ Cloud Engineering, Snowflake, Data Warehouse, Data Mart, with a strong inclination towards analyzing, predicting, mining, engineering, modelling, interpreting, and presenting data.
  • Hands on experience in Hadoop Framework and Layers of Hadoop Framework - Storage (HDFS), Analysis (Pig and Hive), Engineering (Jobs and Workflows), Extending the functionality by writing custom UDFs.
  • Experience in Data transformation, Data mapping from source to target database schema, Data Cleansing procedures.
  • Extensive experience in developing Data warehouse applications usingHadoop, Informatica, Oracle, Teradata, MS SQL server on UNIX and Windowsplatforms.
  • Excellent Knowledge of Relational Database Design, Data Warehouse/OLAP concepts, and methodologies.
  • Experience in designing Star schema and Snowflake schema for Data Warehouse, ODS architecture.
  • Expertise in OLTP/OLAP System Study, Analysis and E-R modeling, developing Database Schemas like Star schema and Snowflake schema used in relational, dimensional, and multidimensional modeling.
  • Experience in creating complex mappings using various transformations and developing strategies for Extraction, Transformation and Loading (ETL) mechanism.
  • Experience in designing and analyzing data using Hive QL, Pig Latin. Implementing complete Hadoop ecosystem comprising of MapReduce, HDFS, Hive, Impala, Sqoop, Oozie, HBase, MongoDB, and Spark and Kafka.
  • Knowledge on Cloud computing infrastructure AWS (amazon web services) and can create modules for spark streaming in data into Data Lake using Spark.
  • Experienced inDataArchitecture anddatamodeling using Erwin, ER-Studio and MS Visio.
  • Experience in coding SQL for developing Procedures, Triggers, and Packages.
  • Experience in Dimensional Data Modeling Star Schema, Snow-Flake Schema, Fact and Dimensional Tables, concepts like Lambda Architecture, and Batch processing.
  • Experienced in python data manipulation for loading and extraction as well as with python libraries such as NumPy, SciPy and Pandas for data analysis and numerical computations.
  • Extensive knowledge of Data Modeling, Data Conversions, Data integration and Data Migration.
  • Experience in importing and exportingdatausing Sqoop from HDFS to Relational Database Systems (RDBMS)- Oracle, DB2 and SQL Server and from RDBMS to HDFS.
  • Well experienced in Normalization, De-Normalization and Standardization techniques for optimal performance in relational and dimensional database environments.
  • Solid understanding of AWS, Redshift, S3, EC2 and Apache Spark, Scala process, and concepts.
  • Work experience with UNIX/Linux commands, scripting and deploying the applications on the servers.
  • Advanced knowledge on data visualizing tools such as Excel, Mini tab, Tableau, Power BI, RStudio. Experience in Performing data visualization and Designing dashboards with Tableau, Power BI to generate complex reports, including charts, summaries, and graphs to interpret findings.
  • Experienced working on NoSQL databases like MongoDB and HBase.

TECHNICAL SKILLS

Operating Systems: Unix, Linux (Ubuntu, CentOS), Mac OS, OpenSUSE, Windows 2003/2008/2012/ XP/7/8/9X/NT/Vista/10:

Hadoop Ecosystem/Distributions: HDFS, MapReduce, Yarn, Oozie, Zookeeper, Job Tracker, Task Tracker, Name Node, Data Node, Cloudera, Horton works:

Big Data Ecosystem: Hadoop, Spark, MapReduce, YARN, Hive, Spark SQL, Impala, Pig, SqoopHBase, Flume, Oozie, Zookeeper, Avro, Parquet, Maven, Snappy, Hue:

Data Ingestion: Sqoop, Flume, NiFi, Kafka:

Cloud Computing: Tools: Snowflake, Snow SQL, AWS, Databricks, Azure data lake services, Amazon EC2

NoSQL Databases: HBase, Cassandra, MongoDB, CouchDB, Apache, Hadoop HBase:

Programming Languages: Python (Jupyter Notebook, PyCharm IDE), R, SQL, PL/SQL, SAS:

Frameworks: MVC, Struts, Spring, Hibernate:

Scripting Languages: Bash, Pearl, Python, R Language:

Databases: Snowflake Cloud DB, Oracle, MySQL, Teradata 12/14, DB2 10.5, MS Access, SQL Server 2000/2005/2008/2012 , PostgreSQL 9.3, Sybase ASE 11.9.2, Netezza, Amazon RDS:

SQL Server Tools: SQL Server Management studio, Enterprise Manager, Query Analyzer, Profiler, Export and Import (DTS):

IDE: IntelliJ, Eclipse, Visual Studio, IDLE:

Packages and Tools: MS-Office, TOAD, SQL Developer & Navigator, Share point portal server, Visual Source Safe, SVN, TFS, BTEQ:

ETL/ Data: Tensor flow, Data API, PySpark, Pervasive Cosmos Business Integrator (Data Junction Tool), CTRL-M, Data Stage, Informatica Power Center 9.6.1/9.5/8.6.1/8.1/7.1 , Talend, Pentaho, Microsoft SSIS, Data Stage 7.5, Ab Initio:

OLAP Tools: MS Analysis Services, Business Objects & Crystal Reports 9, MS SQL Analysis Manager, DB2 OLAP, Cognos Powerplay:

Warehousing and Modelling/ Architect Tools: Erwin 7.3&9.5 (Dimensional Data Modelling, Relational DM, Star Schema, Snowflake, Fact and Dimensional Tables, Physical and Logical DM, Canonical Modelling), Visio 6.0, ER/Studio, Rational System Architect, IBM Infosphere DA, MS Visio Professional, DTM, DTS 2000, SSIS, SSAS:

Reporting / BI Tools: MS Excel, Tableau, Tableau Server and Reader, Power BI, QlikView, SAP Business objects, Crystal Reports, SSRS, Splunk:

PROFESSIONAL EXPERIENCE

Data Engineer

Confidential - Dallas, TX

Responsibilities:

  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice-versa using Sqoop.
  • Created Hive tables for loading and analyzing data.
  • Designed and implemented an ETL framework using Scala and Python to load data from multiple sources into Hive and from Hive to Vertica.
  • Used Spark API over Cloudera Hadoop Yarn to perform analytics on data in Hive.
  • Used HBase on top of HDFS as a non-relational database.
  • Developed Scala scripts using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into the OLTP system through Sqoop.
  • Handled large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations, and others during the ingestion process itself.
  • Used Spark Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real-time and persists into Cassandra.
  • Implemented Partitions, Buckets, and developed Hive query to process the data and generate the data cubes for visualizing.
  • Used Spark API over Cloudera Hadoop Yarn to perform analytics on data in Hive.
  • Extracted image Data stored on local network to Conduct Exploratory Data analysis (EDA), Cleaning and organize. Ran NFIQ algorithm to ensure data quality by collecting the high score images. Finally Created histograms to compare distributions of different datasets.
  • Worked extensively on AWS Components such as Elastic Map Reduce (EMR).
  • Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load (ETL) data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Loaded data using AWS Glue.
  • Used Athena for data analytics.
  • Created various reports using Tableau based on requirements with the BI team.
  • Used DataStax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting, and grouping.

Data Engineer

Confidential - Omaha, NE

Responsibilities:

  • Gathered, analyzed, and translated business requirements to technical requirements, communicated with other departments to collect client business requirements and access available data.
  • Acquiring, cleaning and structuringdatafrom multiple sources and maintain databases/data systems.
  • Performed incremental loads as well as full loads to transfer data from OLTP to Data Warehouse of snowflake schema using different data flow and control flow tasks and provide maintenance for existing jobs.
  • Design and implement secure data pipelines into a Snowflake data warehouse from on-premises and cloud data sources.
  • Kafka was used as message broker to collect large data and to analyze the collected data in the distributed system.
  • Designed ETL process using Talend Tool to load from Sources to Snowflake through data Transformations.
  • Have knowledge on partition of Kafka messages and setting up the replication factors in Kafka Cluster.
  • Developed Snow pipes for continuous injection of data using event handler from AWS (S3 bucket).
  • Design and developed end-to-end ETL process from various source systems to Staging area, from staging to Data Marts and data load.
  • Loading data into Snowflake tables from internal stage using SnowSQL.
  • Prepared data warehouse using Star/Snowflake schema concepts in Snowflake using SnowSQL.
  • Responsible for Data Cleaning, features scaling, features engineering by using NumPy and Pandas in Python.
  • Conducted Exploratory Data Analysis (EDA) using Python Matplotlib and Seaborn to identify underlying patterns and correlation between features.
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
  • Used Python and R scripting by implementing machine algorithms to predict data and forecast data for better results.
  • Experimented with multiple classification algorithms, such as Random Forest and Gradient boosting using Python Scikit-Learn.
  • Built models using Python and Pyspark to predict probability of attendance for various campaigns and events.
  • Performed data visualization and Designed dashboards with Tableau, and generated complex reports, including charts, summaries, and graphs to interpret findings to team and stakeholders.

PL/SQL Developer

Confidential

Responsibilities:

  • Analyzed business requirements, system requirements, data mapping requirement specifications, and was responsible for documenting functional requirements and supplementary requirements in Quality Centre.
  • Tested Complex ETL Mapping and Sessions based on business user requirements and business rules to load data from source flat files and RDBMS tables to target tables.
  • Responsible for different Data mapping activities from Source systems to EDW, ODS and data marts.
  • Delivered file in various file formatting systems (ex. Excel file, Tab-delimited text, Comma-separated text, Pipe delimited text, etc.)
  • Performed ad hoc analyses, as needed, with the ability to comprehend analysis.
  • Involved in Teradata SQL Development, Unit testing, and Performance tuning to ensure testing issues are resolved based on using defect reports.
  • Tested the database to check field size validation, check constraints, stored procedures, and cross verifying the field size defined within the application with metadata.
  • Installed, designed, and developed the SQL Server database.
  • Created a logical design of the central relational database using Erwin.
  • Configured the DTS packages to run in periodic intervals.
  • Extensively worked with DTS to load the data from source systems and run-in periodic intervals.
  • Worked with data transformations in both normalized and de-normalized data environments.
  • Worked on query optimization, stored procedures, views, and triggers.
  • Assisted in OLAP and Data Warehouse environment when assigned.
  • Created tables, views, triggers, stored procedures, and indexes.
  • Designed and implemented database replication strategies for both internal and Disaster Recovery.
  • Created FTP connections, database connections for the sources and targets.
  • Maintained security and data integrity of the database.
  • Developed several forms & reports using Crystal Reports.

We'd love your feedback!