We provide IT Staff Augmentation Services!

Data Modeler Resume

5.00/5 (Submit Your Rating)

Denver, CO

SUMMARY

  • Around 8 years of professional experience in fields of software Analysis, Design, Development, Deployment and Maintenance of software and Big Data applications.
  • Have experience in snowflake to create and Maintain Tables and views. Python Libraries PySpark, Pytest, Pymongo, PyExcel, Psycopg, NumPy and Pandas.
  • Hands - on experience in developing and deploying enterprise-based applications using major Hadoop ecosystem components likeMapReduce, YARN, Hive, HBase, Flume, Sqoop, Spark MLlib, Spark GraphX, Spark SQL, Kafka.
  • Strong Experience in working wif LINUX/Unix environments, writing Shell Scripts.
  • Capable of Processing large sets of structured, semi-structured, unstructured data and supporting systems application architecture.
  • Written Shell scripts and Python scripts to automate end-to-end process and scheduled those jobs in Oozie/Fuse.
  • Experience using various Hadoop Distributions (Cloudera, Hortonworks) to fully implement and leverage new Hadoop features.
  • Optimized Hive queries and performed analytics on structured data using Hive queries, operations, Joins, tuning queries, SerDe's.
  • Extensive experience in migrating SQL databases to Azure Data Lake, Azure Data Lake Analytics, Azure SQL Database, Data Bricks, and Azure SQL Data warehouse, as well as managing and giving database access and migrating on-premises databases to Azure Data Lake stores using Azure Data factory.
  • Architected complete scalable data pipelines, data warehouse for optimized Data ingestion.
  • Collaborated wif data scientists and architects on several projects to create Data Mart as per requirement.
  • Worked wif Spark libraries like SparkContext, SqlContext using Scala programming.
  • Experience in retrieving data from databases like Oracle, SAP Hana, Teradata and MYSQL using Sqoop into HDFS.
  • Extensive experience in developing PIG Latin Scripts and using Hive Query Language for data analytics.
  • Exposure to different other tools like FME safe soft, Alteryx, Spot fire etc.,
  • Created views in Impala for reporting to analysis service using ODBC.
  • Involved in end-to-end implementation of ingesting data from different sources and exporting to reporting tools using Apache NiFi and StreamSets.
  • Experience on Source control repositories like SVN and GIT.
  • Skilled at build/deploy multi module applications using Maven, Ant and integrated wif CI servers like Jenkins.
  • Good understanding and experience wif Software Development methodologies like Agile and Waterfall.
  • Performed ETL for Data Scientists to migrate data from Enterprise Cluster.
  • Great team player and quick learner wif TEMPeffective communication, motivation, and organizational skills combined wif attention to details and business improvements.

TECHNICAL SKILLS

Big Data Ecosystems: HDFS, Apache NiFi, Cloudera Navigator, Sqoop, StreamSets, Kafka, Hive, Pig, Spark, Impala, Oozie, YARN, RabbitMQ.

Development Methodologies: Agile/Scrum, RAD, JAD, System Development Life Cycle (SDLC).

Hadoop Distributions: Apache Hadoop 1.x, Apache Hadoop 2.x, Cloudera, HortonWorks

Databases: Oracle, Vertica, Teradata, PostgreSQL, SAP Hana.

Programming languages: Python,Scala,Shell Scripting.

Cloud: Azura, AWS

Operating Systems: Linux, UNIX, Mac OS-X, CentOS, Windows 8, Windows 7 and Windows Server 2008/2003

Version Control: SVN, CVS, GIT

Web Development: HTML, XML, CSS

PROFESSIONAL EXPERIENCE

Confidential - Denver, CO

Data Modeler

Responsibilities:

  • Worked wif data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Utilized SDLC and Agile methodologies such as SCRUM.
  • Involved in administrative tasks, including creation of database objects such as database, tables, and views, using SQL, DDL, and DML requests.
  • Worked on Data Analysis, Data profiling, and Data Modelling, data governance identifying Data
  • Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
  • Loaded multi format data from various sources like flat-file, Excel, MS Access and performing file system operation.
  • Used T-SQL stored procedures to transfer data from OLTP databases to staging area and finally transfer into data marts.
  • Worked on Physical design for both SMP and MPP RDBMS, wif understanding of RDMBS scaling features.
  • Wrote SQL Queries, Dynamic-queries, sub-queries and complex joins for generating Complex
  • Stored Procedures, Triggers, User-defined Functions, Views and Cursors.
  • Wrote simple and advanced SQL queries and scripts to create standard and ad hoc reports for senior managers.
  • Performed ETL SQL optimization designed OLTP system environment and maintained documentation of Metadata.
  • Involved wif Data Analysis Primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
  • Worked wif developers on data Normalization and De-normalization, performance tuning issues, and aided in stored procedures as needed.
  • Used Teradata for OLTP systems by generating models to support Revenue Management Applications that connect to SAS.
  • Created SSIS Packages for import and export of data between Oracle database and others like MS Excel and Flat Files.
  • Worked in the capacity of ETL Developer (Oracle Data Integrator (ODI) / PL/SQL) to migrate data from various sources in to target Oracle Data Warehouse.
  • Designed and Developed PL/SQL procedures, functions and packages to create Summary tables.
  • Involved in creating tasks to pull and push data from Salesforce to Oracle Staging/Data Mart.
  • Created VBA Macros to convert the Excel Input files in to correct format and loaded them to SQL Server.

Confidential - Plano, TX

Big Data Engineer

Responsibilities:

  • Processed BigData using a Hadoop cluster consisting of 40 nodes.
  • Designed and configured Flume servers to collect data from the network proxy servers and store to HDFS.
  • Loaded the customer profiles data, customer spending data, credit from legacy warehouses onto HDFS using Sqoop.
  • Applied transformations and filtered both traffic using Pig.
  • Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase.
  • Performed unit testing using MRUnit.
  • Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and persists into Cassandra.
  • Imported data from different sources into HDFS using Spark, analyzed data using SqlContext in scala and created tables in Hive using HiveContext.
  • Consolidated the small files for large set of data using Spark Scala to create table on the data.
  • Worked on tuning the performance of Apache NiFi workflow to optimize the ingestion speeds.
  • Used Spark SQL to process huge amount of structured data and implemented Spark RDD transformations and actions.
  • Writing Scala Applications which runs on Amazon EMR cluster that fetches data from the Amazon S3 location and queue it in the Amazon SQS (simple Queue Services) queue.
  • Created an AWS Lambda function and configured it to receive events from you're S3 bucket.
  • Developed the ETL Data pipeline for data loading from centralized Data Lake/ AWS service S3 as a data source to Postgres (RDBMS) using Spark.
  • Used Cloud watch to monitor logs and log metrics generated by applications.
  • Experienced in making data cosumable from different other tools like FME safe soft, Alteryx, Spot fire etc.,
  • Worked wif StreamSets Data Collector to get the data loaded into HDFS from various sources like from local, RDBMS Databases, API’s etc.,
  • Participated in developing NodeJS application to pull data in Json format from Impala tables.
  • Worked wif WebHDFS Rest API which provides web services access to data stored in HDFS.
  • Developed various scripts using Shell Script.

Environment: Cloudera, HDFS, Sqoop, Hive, Kafka, NiFi, StreamSets, Shell Scripting, Spark, Scala, WebHDFS, HBase, Oracle,Node JS,Amazon SQS, SQL Context.

Confidential - Charlotte, NC

Big Data Engineer

Responsibilities:

  • Designed and implemented ETL pipelines between from various Relational data Bases to the Data Warehouse using Apache Airflow.
  • Developed Python scripts to automate the ETL process using Apache Airflow.
  • Used Apache NiFi to ingest the real time OLTP data from disparate sources to HDFS and Kafka.
  • Involved in moving the generated huge amount of structured and un-structured data, log data from the various sources to Hadoop clusters for further processing.
  • Written scripts to continuously dump and reload data from Oracle to HDFS using Sqoop and Vice-versa.
  • Implemented HQL Scripts in creating Hive tables, loading, analyzing, merging, binning, backfilling, cleansing using hive.
  • Ingested the final table data from Cloudera into Vertica server and tan loading into Vertica tables.
  • Developed Python and Bash, Shell scripts to automate the end-to-end implementation process of AI project.
  • Queries MySQL database queries from Python using Python-MySQL Connector and MySQL DB package to retrieve necessary information for the company, Resulting in a 75% retrieval rate.
  • Accurately wrote more than 100 Python and batch scripts to automate the ETL scripts runs every hour.
  • Migrate data from on-premises to AWS storage buckets
  • Developed a python script to transfer data from on-premises to AWS S3
  • Developed a python script to hit REST API's and extract data to AWS S3
  • Utilized windows task scheduler to run Python scripts which generated reports at specific intervals and sent corresponding email alerts.
  • Involved in deployment process using Jenkins and also worked closely in fixed the version issues.
  • Responsible for monthly data refresh for all the MVPD’s to generate advertising campaign data.
  • Designed and Implemented Partitioning (static, Dynamic) and Bucketing in HIVE.

Environment: Hadoop, HDFS, Hive, Spark, Scala, Shell Scripting, Python, Jenkins, Tomcat, PostgreSQL, Stash, Cloudera, Oracle, Vertica, Sqoop, Apach airflow.

Confidential

Big Data/Hadoop Developer

Responsibilities:

  • Gather business requirements, definition, and design of the data sourcing, and worked wif the data warehouse architect on the development of logical data models.
  • Automated the cloud deployments using python and AWS Cloud Formation Templates.
  • Work wif our current application teams to understand our current applications and make migration recommendations and to-be architectures in AWS
  • Worked on performance tuning of Apache NiFi workflow to optimize the data ingestion speeds.
  • Used RabbitMQ as a messaging service to get down streams notified about the data ingested.
  • Involved in End-to-End implementation of ETL logic.
  • Pre-Processed the data ingested using Apache Pig to eliminate the bad records as per business requirements wif the help of filter functions, User Defined Functions.
  • Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate wif other Azure Services. Knowledge of USQL
  • Monitored cluster health by Setting up alerts using Nagios and Ganglia
  • Working on tickets opened by users regarding various incidents, requests
  • Writing UNIX shell scripts to automate the jobs and scheduling cron jobs for job automation using commands wif Crontab.
  • Transforming business problems into Big Data solutions and define Big Data strategy and Roadmap. Installing, configuring, and maintaining Data Pipelines
  • Primarily involved in Data Migration using SQL, SQL Azure, Azure Storage, and Azure Data Factory, SSIS,PowerShell
  • Designing the business requirement collection approach based on the project scope and SDLC methodology.
  • Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in inAzure Databricks.
  • Developed Simple to complex Map Reduce Jobs using Hive and Pig.
  • Worked on Oozie workflows for automation of Hive, Pig jobs and developed.
  • Configured Team city for Continuous Integration/Continuous Deployment (CI/CD) of code into Edge Node.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Responsible for writing validation script using Shell scripting.
  • Used GitHub for version control repository.
  • Worked in XML transformation using XSLT.
  • Participate and contribute to estimations and Project Planning wif team and Architecture.

Environment: Hadoop, HDFS, Apache NiFi, RabbitMQ, Oozie, Pig, Hive, Shell Scripting, LINUX, Shell Scripting, Cloudera 5.4, Scala, Azure, Aws services.

Confidential

Hadoop Developer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Developed Simple to complex Map reduce Jobs using Hive and Pig.
  • Optimized Map Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
  • Worked wif Linux systems and RDBMS database on a regular basis in order to ingest data using Sqoop.
  • Created HiveTables, loaded transactional data from Teradata using Sqoop.
  • Performed K-means clustering, Regression and Decision Trees in R. Worked on data cleaning and reshaping, generated segmented subsets using NumPy and Pandas in Python.
  • Worked wif Market Mix Modeling to strategize the advertisement investments to better balance the ROI on advertisements.
  • Used Python libraries like NumPy, SciPy, pandas, scikit-learn, seaborn and Spark libraries PySpark, MLlib to develop a variety of models and algorithms for analytic purposes.
  • Created and worked Sqoop jobs wif incremental load to populate Hive External tables.
  • Experience working wif very large datasets of data distributed across large data clusters
  • Creating Hive tables and working on them using Hive QL.
  • Involved in build applications using Maven and integrated wif CI servers like Jenkins to build jobs.
  • Worked collaboratively wif all levels of business stakeholders to architect, implement and test Big Data based analytical solution from disparate sources.
  • Created and maintained Technical documentation for executing Hive queries, Pig Scripts, Sqoop jobs.
  • Involved in Agile methodologies, daily scrum meetings, spring planning.

Environment: Hadoop, HDFS, Hive, Pig, Sqoop, Oozie, Maven, Shell Scripting, Cloudera.

We'd love your feedback!