We provide IT Staff Augmentation Services!

Data Modeler/engineer Resume

0/5 (Submit Your Rating)

Negaunee, MI

SUMMARY

  • Around 9+ years of programming experience involved in all phases of Software Development Life Cycle (SDLC)
  • Over 4+ Years of Big Data experience in building highly scalable data analytics applications.
  • Strong experience working with Hadoop ecosystem components like HDFS, Map Reduce, Spark, HBase, Oozie, Hive, Sqoop, Pig, Flume and Kafka
  • Good hands - on experiencing working with various Hadoop distributions mainly Cloudera (CDH), Hortonworks (HDP) and Amazon EMR.
  • Good understanding of Distributed Systems architecture and design principles behind Parallel Computing.
  • Expertise in developing production ready Spark applications utilizing Spark-Core, Dataframes, Spark-SQL, Spark-ML and Spark-Streaming API's.
  • Azure Data Engineer with experience in designing, building, and maintaining data pipelines and architectures using Azure data services. Skilled in developing and deploying scalable, secure, and highly available data solutions to support business objectives.
  • Proficient in SQL, Python, and Azure DevOps. Proven ability to work effectively in a fast-paced, dynamic environment, delivering data-driven insights and business value to stakeholders.
  • Proficient in using Flyway, an open-source database migration tool, for managing database changes.
  • Strong knowledge of Google Cloud services, including BigQuery, Cloud Storage, and Cloud Dataflow.
  • Proven ability to translate business requirements into efficient and scalable data solutions, delivering data-driven insights to stakeholders.
  • Worked extensively on Hive for building complex data analytical applications.
  • Sound knowledge in map side join, reduce side join, shuffle & sort, distributed cache, compression techniques, multiple Hadoop Input & output formats.
  • Knowledge of best practices and common patterns for writing Terraform code, including using variables, modules, and remote state management.
  • Good experience working with AWS Cloud services like S3, EMR, Redshift, Athena, Glue Metastore etc.,
  • Deep understanding of performance tuning, partitioning for optimizing spark applications.
  • Worked on building real time data workflows using Kafka, Spark streaming and HBase.
  • Extensive knowledge on NoSQL databases like HBase, Cassandra and Mongo DB.
  • Solid experience in working with csv, text, sequential, Avro, parquet, orc, json formats of data.
  • Extensive experience in performing ETL on structured, semi-structured data using Pig Latin Scripts.
  • Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading, and storing of data.
  • Experience in using Hadoop ecosystem and processing data using Tableau.
  • Good knowledge in the core concepts of programming such as algorithms, data structures, collections.
  • Developed core modules in large cross-platform applications using JAVA, JSP, Servlets, Hibernate, RESTful, JDBC, JavaScript, XML, and HTML.
  • Extensive experience in developing and deploying applications using Web Logic, Apache Tomcat and JBOSS.
  • Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc.
  • Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).

PROFESSIONAL EXPERIENCE

Confidential, Negaunee, MI

Data Modeler/Engineer

Responsibilities:

  • Responsible for ingesting large volumes of user behavioral data and customer profile data to Analytics Data store.
  • Developed custom multi-threaded Java based ingestion jobs as well as Sqoop jobs for ingesting from FTP servers and data warehouses.
  • Developed Scala based Spark applications for performing data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting teams to consume.
  • Worked on troubleshooting spark application to make them more error tolerant.
  • Worked on fine-tuning spark applications to improve the overall processing time for the pipelines.
  • Wrote Kafka producers to stream the data from external rest APIs to Kafka topics.
  • Wrote Spark-Streaming applications to consume the data from Kafka topics and write the processed streams to HBase.
  • Designing and implementing data storage solutions in Azure, such as Azure Data Lake, Azure Blob Storage, and Azure SQL Database.
  • Implementing data processing pipelines using Azure services like Azure Data Factory, Azure Stream Analytics, and Azure Databricks.
  • Skilled in configuring and integrating Flyway with various databases such as MySQL, PostgreSQL, Oracle, SQL Server, etc.
  • Collaborating with data scientists and data analysts to understand their data needs and design solutions that meet their requirements.
  • Developing and maintaining data integration solutions, such as data ingestion from various sources, data transformation, and data enrichment.
  • Experienced in handling large datasets using Spark in Memory capabilities, using broadcasts variables in Spark, effective & efficient Joins, transformations, and other capabilities.
  • Worked extensively with Sqoop for importing data from Oracle.
  • Experience working for EMR cluster in AWS cloud and working with S3, Redshift, Snowflake.
  • Involved in creating Hive tables, loading, and analyzing data using hive scripts.
  • Implemented Partitioning, Dynamic Partitions, Buckets in Hive.
  • Good experience with continuous Integration of application using Bamboo.
  • Used Reporting tools like Tableau to connect with Impala for generating daily reports of data.
  • Collaborated with the infrastructure, network, database, application, and BA teams to ensure data quality and availability.
  • Designing, developing, and maintaining user-friendly data visualizations and dashboards using complex datasets from different sources.
  • Designed, documented operational problems by following standards and procedures using JIRA.

Environment: AWS EMR, Spark, Scala, Python, Hive, Sqoop, Oozie, Kafka, YARN, JIRA, S3, Redshift, Athena, Shell Scripting, GitHub, Maven, Azure SQL Database, Azure Data Factory, Azure Stream Analytics, Azure Cosmos DB, and Azure Data Lake Storage, Flyway

Confidential, Eagan, MN

Data Engineer AWS

Responsibilities:

  • Worked on building end-to-end Scala based Spark applications for cleansing, auditing, and transforming raw data feeds from multiple report suites.
  • Proficient in using Terraform, an open-source infrastructure as code tool, for provisioning and managing infrastructure resources.
  • Experience in writing Terraform modules to automate the creation and deployment of resources on various cloud providers such as AWS, Azure, Google Cloud, etc.
  • Familiarity with Terraform's declarative syntax for defining infrastructure resources and its state management system.
  • Skilled in using Terraform with continuous integration and delivery tools such as Jenkins, Travis CI, GitLab CI/CD, etc.
  • Familiarity with other infrastructure as code tools such as Ansible, Chef, and Puppet.
  • Extensively worked on HIVE, created numerous Internal and external tables as part of the analysis requirements.
  • Written custom UDF's in HIVE according to business requirements.
  • Hands on experience in data loading techniques using Sqoop.
  • Experience in using AWS Simple workflow for workflow design and for scheduling workflows.
  • Executed full CI/CD pipeline by coordinating SCM (Git) with computerized testing instrument Gradle and Deployed utilizing Jenkins (Declarative Pipeline) and Dockerizing holders underway and furthermore occupied with not many DevOps devices like AWS Cloud formation, AWS Code pipeline, Terraform and Kubernetes.
  • Used Spark Dataframes, Spark-SQL, Spark MLLib extensively.
  • Created batch and real time pipelines using Spark as the main processing framework.
  • Integrated Kafka with Spark Streaming for real time data processing.
  • Work closely with business, transforming business requirements to technical requirements.
  • Hands on experience in working with AWS Cloud Services like EMR, S3 and Redshift.
  • Been part of Design Reviews & Daily Project Scrums.

Environment: AWS Cloud, Hadoop, Terraform, Spark, Hive, Teradata, Oozie, Spring Boot, JUnit, IntelliJ, Maven and Git Hub, Docker, Kubernetes

Confidential, Fort Worth, TX

Sr. HADOOP Developer

Responsibilities:

  • Involved in loading and transforming large sets of Structured and Semi-Structured data and analyzed them by running Hive queries.
  • Converted existing MapReduce programs to Spark to cleanse the data in HDFS obtained from multiple data sources to make it suitable for ingestion into Hive for analysis.
  • Designed and developed Spark jobs to process data coming in different file formats like XML, CSV, and JSON.
  • Experience in hive partitioning, bucketing and perform joins on hive tables and utilizing hive SerDes like REGEX, JSON and AVRO.
  • Executed tasks for upgrading cluster on the staging platform before doing it on production cluster.
  • Performed maintenance, monitoring, deployments, and upgrades across infrastructure that supports all our Hadoop clusters.
  • Optimized Hive analytics SQL queries, created tables/views, written custom UDFs and Hive based exception processing.
  • Involved in transforming the large data from Teradata to HDFS and vice versa using Sqoop incremental imports.
  • Stored and processed data by using low level Java APIs to ingest data directly to HBase.
  • Exported the analyzed data to the relational databases using Sqoop and to generate reports for the BI team.
  • Experience in managing and reviewing Hadoop log files.
  • Installed and configured various components of Hadoop ecosystem.
  • Replaced default Derby metadata storage system for Hive with MySQL system.
  • Design and implement data storage solutions using GCP services like Google Cloud Storage, BigQuery, and Cloud SQL.
  • Implement data processing pipelines using GCP services like Cloud Dataflow, Cloud Dataproc, and Cloud Pub/Sub.
  • Ensure data quality and security by implementing data validation and monitoring processes.
  • Collaborate with data scientists and data analysts to understand their data needs and design solutions that meet their requirements.
  • Develop and maintain data integration solutions for ingestion from various sources, data transformation, and data enrichment.
  • Implemented data migration and data engineering solutions using Azure products and services: (Azure Data Lake Storage, Azure Data Factory, Azure Functions, Event Hub, Azure Stream Analytics, Azure Databricks, etc.) and traditional data warehouse tools
  • Performed multiple aspects involved in the development lifecycle - design, cloud engineering (Infrastructure, network, security, and administration), ingestion, preparation, data modeling, testing, CICD pipelines, performance tuning, deployments, consumption, BI, alerting, prod support
  • Provided technical leadership and collaborate within a team environment as well as work independently

Environment: CDH, Pig, Hive, Map Reduce, YARN, Oozie, Flume, Sqoop, Impala, Spark, Scala, SQL Server, Oracle, Shell Scripting, Google Cloud Storage, BigQuery, Cloud SQL, Cloud AI Platform and Cloud Pub/Sub, Azure Data Lake, Azure Power Apps and Power BI

Confidential

HADOOP Developer

Responsibilities:

  • Actively Participated in all phases of the Software Development Life Cycle (SDLC) from implementation to deployment.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Responsible for Cluster maintenance, adding and removing cluster nodes, Cluster Monitoring and Troubleshooting, Managing, and reviewing data backups & log files.
  • Responsible to manage the test data coming from different sources.
  • Analyzed data using Hadoop components Hive and Pig.
  • Load and transform large sets of structured, semi structured, and unstructured data using Hadoop/Big Data concepts.
  • Involved in importing and exporting the data from RDBMS to HDFS and vice versa using Sqoop.
  • Involved in loading data from UNIX file system to HDFS.
  • Responsible for creating Hive tables, loading data, and writing hive queries.
  • Created Hive External tables and loaded the data into tables and query data using HQL.
  • Handled importing data from various data sources, performed transformations using Hive, Map Reduce, and loaded data into HDFS.
  • Created and maintained technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.
  • Extracted the data from Teradata into HDFS using the Sqoop.
  • Exported the patterns analyzed back to Teradata using Sqoop.
  • Experience in Monitoring System Metrics and logs for any problems adding, removing, or updating Hadoop Cluster.
  • Involved in scheduling Oozie workflow engine to run multiple Hives and pig jobs and used Oozie workflows for batch processing and scheduling workflows dynamically.

Environment: Hadoop, Spark, Scala, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, Zookeeper

Confidential 

Data Modeler

Responsibilities:

  • Prepared and reviewed business requirements and client documentation.
  • Produced functional specifications and led weekly meetings with developers and business units to discuss outstanding technical issues and deadlines that had to be met.
  • Involved in performing data mining tasks on historical migrated data from legacy systems.
  • Wrote requirements for designing data marts that were used as the source for the various reporting systems in the company.
  • Performed dimensional profitability analysis, planning financial reporting using SAS Financial Intelligence tool.
  • Experience with integration and cleansing of all financial data, budgeting, forecasting, score carding, simulation, risk management using SAS Financial.
  • Adopted OOD and RAD methodologies for faster creation of the prototype.
  • Performed Data Modeling by creating logical and Physical models of the database system. Developed conversion mapping that maps legacy customer master data (customer accounts, profiles, contacts and telephone data) to Oracle financials customer interface.
  • Created SQL*Loader scripts to load legacy data into Oracle staging tables.
  • Created staging tables necessary to store validated customer data prior to loading data into customer interface tables.
  • Validation includes organizations, customers and addresses validation like duplicates and nulls, ship to and bill to validation, checking customer profile classes are setup in oracle and site use code exists in oracle etc.
  • Using UNIX shell scripting created batch jobs to create Index and Analyze tables once the Imports are done.
  • Designed star schema dimensional modeling and created fact tables and dimensional tables.
  • Developed views, functions, procedures, and packages using PL/SQL & SQL to transform data between source staging area to target staging area.
  • Loading data from flat files into to database tables using SQL* loader.
  • For optimization process actively involved in performed database capacity planning created materialized views. Tuned and optimized SQL statements
  • Designed, Developed and Reviewed ETL mappings from source to staging and from staging to Data warehouse.
  • Designed and developed ETL to extract external data source (Oracle, Excel file)

Environment: Oracle 10 g/9.2, SQL Server 2000, SQL, PL/SQL, Export/Import, SQL*Loader, Korn Shell Script, HP UNIX, Sun Solaris, TOAD, Windows 2000, Erwin, Business Objects, SAS Financial Intelligence and PVCS.

We'd love your feedback!