We provide IT Staff Augmentation Services!

Data Engineer/big Data Engineer Resume

2.00/5 (Submit Your Rating)

Madison, WI

SUMMARY

  • Around 8 + years of working experience as Data Engineer wif high proficient noledge in Data Analysis.
  • Experienced using "Big data" work on Hadoop, Spark, PySpark, Hive, HDFS and other NoSQL platforms.
  • Experience in transferring the data using Informatica tool from AWS S3 to AWS Redshift.
  • Hands on experience wif Amazon Web Services along wif provisioning and maintaining AWS resources such as EMR, S3 buckets, EC2instances, RDS and others.
  • Experienced in Informatica ILM and Informatica Lifecycle Management and its tools.
  • Efficient in all phases of the development lifecycle, coherent wif Data Cleansing, Data Conversion, Data Profiling, Data Mapping, Performance Tuning and System Testing.
  • Supporting ad - hoc business requests and Developed Stored Procedures and Triggers and extensively used Quest tools like TOAD.
  • Good understanding and exposure to Python programming.
  • Excellent working experience in Scrum/Agile framework and Waterfall project execution methodologies.
  • Implemented various algorithms for analytics using Cassandra wif Spark and Scala.
  • Extensive experience working wif business users/SMEs as well as senior management.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Experienced in Technical consulting and data modeling, data governance and design - development - implementation of solutions.
  • Experience in installation, configuration, supporting and managing - Cloudera Hadoop platform along wif CDH4&CDH5 clusters.
  • Good understanding of Ralph Kimball (Dimensional) & Bill Inman (Relational) model Methodologies.
  • Strong experience in using MS Excel and MS Access to dump the data and analyze based on business needs.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Pyspark concepts.
  • Hands on Experience in MapR and Hortonworks Hadoop Distributions.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Snow Core Certified. Experience wif Snowflake Multi-Cluster Warehouses and building Snow pipe, loading data from the local system and AWS S3 Bucket, in-depth noledge of Data Sharing in Snowflake, in-depth noledge of Snowflake Database, Schema, and Table structures, and experience in using Snowflake Clone and Time Travel.
  • In-depth understanding of Snowflake cloud technology, Snowflake Multi-cluster Size and Credit Usage
  • Experience wif streaming/messaging platforms (Kafka, MQ)
  • Hands on experience in Hadoop ecosystem components like Hadoop MapReduce, HDFS, HBase, Hive and Sqoop.
  • Experience in architecture and implementation of large and highly complex projects using Cloudera or Hortonworks
  • Understanding Business Requirements, flow of application and working experience on Bug fixing.
  • Independently perform complex troubleshooting, root-cause analysis, and solution development.
  • Well organized versatile, flexible, self-motivated team player wif collaborative approach & team building skills wif proficiency in grasping new technical concepts quickly and utilizing the same in a productive manner.

TECHNICAL SKILLS

Technologies: Hadoop (Cloudera, Horton works, Pivotal HD), Apache Spark, Apache Kafka, Apache HBase, Flume, Talend, Hive, Pig, Sqoop, Storm, Mahout, Oozie, Tableau, Java Beans, Servlets, JSP, JDBC, EJB, JNDI, JMS, RMI.

Reporting Tools: Tableau, Power BI, SSRS

Database: Cassandra, HBase, Oracle 11g, SQL server 2008, MySQL

IDE: Eclipse, Net Beans, IBM RAD, JBuilder.

Design Methodology: UML, Water Fall, Perl, Agile

Operating Systems: Windows, Linux, UNIX

Query Languages: SQL, PL/SQL.

Programming Language: Python, Java, Scala, and UNIX Shell Scripting.

Design patterns: Business Delegate, Business Object, Value Object, Front Controller, Database Access Object, Factory, Singleton, Session Facade.

Tools: BEA WebLogic, JBOSS, IBM Web Sphere Application Server 6.1, Tomcat 6.0, J Unit 4.0, ANT, Log4j, Mercury Quality Centre, Rational Clear Quest, ANT, Maven, SVN, Toad

Design & Control: UML, Rational Rose, CVS, Clear Case

PROFESSIONAL EXPERIENCE

Confidential, Madison, WI

Data Engineer/Big Data Engineer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop components.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
  • Joined various tables in Cassandra using spark and Scala and ran analytics on top of them.
  • Participated in various upgradations and troubleshooting activities across enterprise.
  • Knowledge in performance troubleshooting and tuning Hadoop clusters.
  • Applied Spark advanced procedures like text analytics and processing using the in-memory processing.
  • Solid Understanding of Hadoop HDFS Map-Reduce and other Eco-System Projects
  • Implemented Apache Drill on Hadoop to join data from SQL and No SQL databases and store it in Hadoop.
  • Created architecture stack blueprint for data access wif NoSQL Database Cassandra;
  • Brought data from various sources in to Hadoop and Cassandra using Kafka.
  • Experienced in using Tidal enterprise scheduler and Oozie Operational Services for coordinating the cluster and scheduling workflows.
  • Working on Snowflake modeling and highly proficient in data warehousing techniques for data cleansing, Slowly Changing Dimension phenomenon, surrogate key assignment and change data capture.
  • Extensively used Star and Snowflake Schema methodologies in building and designing the logical data model into Dimensional Models
  • Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team Using Tableau.
  • Implemented Composite server for the data virtualization needs and created multiples views for restricted data access using a REST API.
  • Devised and lead the implementation of next generation architecture for more efficient data ingestion and processing.
  • Created and implemented various shell scripts for automating the jobs.
  • Implemented Apache Sentry to restrict the access on the hive tables on a group level.
  • Employed AVRO format for the entire data ingestion for faster operation and less space utilization.
  • Experienced in managing and reviewing Hadoop log files.
  • Worked in Agile environment, and used rally tool to maintain the user stories and tasks.
  • Worked wif Enterprise data support teams to install Hadoop updates, patches, version upgrades as required and fixed problems, which raised after the upgrades.
  • Implemented test scripts to support test-driven development and continuous integration.
  • Used Spark for Parallel data processing and better performances.

Environment: Apache HDFS Map Reduce, Hive, pig, Impala, Cassandra 5.04, spark, Scala, Solr, Java, SQL, Tableau, PIG, Zookeeper, Sqoop, Teradata, Python, CentOS, Pentaho, Designed and implemented TEMPeffective Analytics solutions and models wif Snowflake, Hadoop Big Data, Spark, Kafka.

Confidential, Thousand Oaks, CA

AWS Snowflake Data Engineer

Responsibilities:

  • Played a key role in Migrating Teradata objects into the Snowflake environment.
  • Experience wif Snowflake Multi-Cluster Warehouses and Snowflake Virtual Warehouses.
  • Experience in building Snow pipe.
  • Involved in Migrating Objects from Teradata to Snowflake
  • As Data Engineer in to drive projects using Spark, SQL and Azure cloud environment.
  • Worked on data governance to provide operational structure to previously ungoverned data environments.
  • Participated in the requirement gathering sessions to understand the expectations and worked wif system analysts to understand the format and patterns of the upstream source data.
  • Done data migration from an RDBMS to a NoSQL database, and gives the whole picture for data deployed in various data systems.
  • Designed and implement end-to-end data solutions (storage, integration, processing, and visualization) in AWS.
  • Designed and implemented TEMPeffective Analytics solutions and models wif Snowflake.
  • Enterprise Data Lake was designed and set up to enable a variety of use cases, covering analytics, processing, storing, and reporting on large amounts of data.
  • Involved in Data Modeling using Star/Snowflake Schema Design using Erwin.
  • Collaborated wif the clients and solution architect for maintaining quality data points in source by carrying out the activities such as cleansing, transformation, and maintaining Integrity in a relational environment.
  • Create and setup self-hosted integration runtime on virtual machines to access private networks.
  • Working on building visuals and dashboards using Power BI reporting tool.
  • Built Apache Airflow wif AWS to analyze multi-stage machine learning processes wif Amazon SageMaker tasks.
  • Developed streaming pipelines using Apache Spark wif Python.
  • Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda
  • Used AWS EMR to move large data (Big Data) into other platforms such as AWS data stores, Amazon S3 and Amazon Dynamo DB.
  • Created PL/SQL packages and Database Triggers and developed user procedures and prepared user manuals for the new programs.
  • To meet specific business requirements wrote UDF’s in Scala and PySpark.
  • Worked in Agile environment and used rally tool to maintain the user stories and tasks.
  • Worked on Informatica tools such as power center, MDM, Repository manager and workflow monitor.
  • Created Kibana Dashboards and combined several source and target systems into Elastic search for real-time analysis of end-to-end transactions tracking
  • Worked wif Enterprise data support teams to install Hadoop updates, patches, version upgrades as required and fixed problems, which raised after the upgrades.
  • Implemented test scripts to support test-driven development and continuous integration.
  • Used Spark for Parallel data processing and better performances.
  • Used Python to extract data for Web scraping.
  • Conducted numerous training sessions, demonstration sessions on Big Data.
  • Built campaigns in UNICA, to generate custom offers.

Environment: AWS EMR, S3, RDS, Redshift, Lambda, Boto3, DynamoDB, Designed and implemented TEMPeffective Analytics solutions and models wif Snowflake, Amazon SageMaker, Apache Spark, HBase, Apache Kafka, HIVE, SQ00P, Map Reduce, Apache Pig, Python, Tableau, UNICA, Kibana, Informatica.

Confidential

Data Engineer

Responsibilities:

  • As a Data Engineer, assisted in leading the plan, building, and running states wifin the Enterprise Analytics Team.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Engaged in solving and supporting real business issues wif you're Hadoop distributed File systems and Open Source framework noledge.
  • Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.
  • Involved in various phases of development analyzed and developed the system going through Agile Scrum methodology.
  • Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
  • Build a program wif Python and apache beam and execute it in cloud Data flow to run Data validation between raw source file and Big query tables.
  • Built the data pipelines dat will enable faster, better, data-informed decision-making wifin the business.
  • Used Rest API wif Python to ingest Data from and some other site to Big Query.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Performed Data transformations in Hive and used partitions, buckets for performance improvements.
  • Optimized Hive queries to extract the customer information from HDFS.
  • Involved in scheduling Oozie workflow engine to run multiple Hive jobs.
  • Continuously monitor and manage data pipeline (CI/CD) performance alongside applications from a single console wif GCP.
  • Developed Spark scripts by using python and bash Shell commands as per the requirement.
  • Worked on POC to check various cloud offerings including Google Cloud Platform (GCP).
  • Developed a POC for project migration from on prem Hadoop MapR system to GCP.
  • Compared Self hosted Hadoop wif respect to GCPs Data Prow, and explored Big Table (managed HBase) use cases, performance evolution.
  • Write a Python program to maintain raw file archival in GCS bucket.
  • Implemented business logic by writing UDFs and configuring CRON Jobs.
  • Designed Google Compute Cloud Data Flow jobs dat move data wifin a 200 PB data lake.
  • Implemented scripts dat load Google Big Query data and run queries to export data.

Environment: Hadoop 3.3, Spark 3.1, Python, GCP, Data Lake, GCS, HBase, Oozie, Hive, CI/CD, Big Query, Hive, Rest API, Agile Methodology

Confidential

Data Analyst

Responsibilities:

  • Worked as Data Engineer to collaborate wif other Product Engineering team members to develop, test and support data-related initiatives.
  • Developed understanding of key business, product and user questions.
  • Followed Agile methodology for the entire project.
  • Defined the business objectives comprehensively through discussions wif business stakeholders, functional analysts and participating in requirement collection sessions.
  • Provided a summary of the Project's goals, and the specific expectation of business users from BI and how it aligns wif the project goals. d the estimation, review the estimates, identify the complexities and communicate to all the stakeholders.
  • Responsible for data governance rules and standards to maintain the consistency of the business element names in the different data layers.
  • Migrated on-primes environment on Cloud using MS Azure.
  • Worked on migration of data from On-prem SQL server to Cloud databases (Azure Synapse Analytics (DW) & Azure SQL DB).
  • Performed data flow transformation using the data flow activity.
  • Performed ongoing monitoring, automation, and refinement of data engineering solutions.
  • Creating pipelines, data flows and complex data transformations and manipulations using ADF and PySpark wif Databricks.
  • Developed mapping document to map columns from source to target.
  • Created azure data factory (ADF pipelines) using Azure PolyBase and Azure blob.
  • Performed ETL using Azure Data Bricks.
  • Wrote UNIX shell scripts to support and automate the ETL process.
  • Worked on python scripting to automate generation of scripts. Data curation done using azure data bricks.
  • Used stored procedure, lookup, executes pipeline, data flow, copy data, azure function features in ADF.
  • Worked on Kafka to bring the data from data sources and keep it in HDFS systems for filtering.
  • Created several Databricks Spark jobs wif PySpark to perform several tables to table operations.
  • Working on building visuals and dashboards using Power BI reporting tool.
  • Providing 24/7 On-call Production Support for various applications.

Environment: Hadoop, Spark, Kafka, Azure Data Bricks, ADF, Python, PySpark, HDFS, ETL, Agile & Scrum meetings

We'd love your feedback!