We provide IT Staff Augmentation Services!

Big Data Engineer Resume

3.00/5 (Submit Your Rating)

NC

SUMMARY:

  • Around 7+ years of experience in Software/Application Development using Python,SQL, and in - depth understanding of Distributed Systems Architecture and Parallel Processing Frameworks.
  • Deep knowledge and strong deployment experience in Hadoop and Big Data ecosystems- HDFS, MapReduce, Spark, Pig, Sqoop, Hive, Oozie, Kafka, zookeeper, and HBase.
  • Knowledge on current trends in data technologies, data services, data virtualization, data integration, Master Data Management.
  • Used various Hadoop distributions (Cloudera, Hortonworks, Amazon EMR, Microsoft Azure HDInsight) to fully implement and leverage new Hadoop features.
  • Constructing and manipulating large datasets of structured, semi-structured, and unstructured data and supporting systems application architecture using tools like SAS, SQL, Python, R, MiniTab, PowerBI, and more to extract multi-factor interactions and drive change.
  • Responsible for converting all ETL logic into SQL queries, create INFA mapping to load into Netezza and Snowflake database. Played key role in Migrating Teradata objects into SnowFlake environment.
  • Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop.
  • Expertise in working with Hive data warehouse infrastructure creating tables, data distribution by implementing Partitioning and Bucketing, developing and tuning the HQL queries.
  • Strong experience in tuning Spark applications and Hive scripts to achieve optimal performance.
  • Developed Spark Applications using Spark RDD, Spark-SQL and Dataframe APIs.
  • Created Hive tables, loaded data into it, and wrote Hive Ad-hoc queries that would run internally in MapReduce and Spark.
  • Created Logical and Physical Data Models by using Erwin based on requirement analysis.
  • Expertise in AWS Resources like EC2, S3, EBS, VPC, ELB, AMI, SNS, RDS, IAM, Route 53, Auto scaling, Cloud Formation, Cloud Watch, Security Group.
  • Experience in optimizing volumes, EC2 instances

PROFESSIONAL EXPERIENCE:

Confidential, NC

Big Data Engineer

Responsibilities:

  • Created Hive tables for loading and analysing data. Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from MYSQL into HDFS vice - versa using Sqoop. Used Spark API over Cloudera Hadoop Yarn to perform analytics on data in Hive. Loaded the data into Spark RDD, perform advanced procedures like text analytics and processing using in-memory data computation's capabilities of Spark using Scala to generate the Output response. ETL development using EMR/Hive/Spark, Lambda, Scala, DynamoDB Streams,
  • Amazon Kinesis Firehose, Redshift and S3. Developed Scala scripts using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into the OLTP system through Sqoop. Wrote various data normalization jobs for new data ingested into Redshift. Handled large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective efficient Joins, Transformations and others during the ingestion process itself. Worked with AWS cloud platform and its features which include EC2, IAM, EBS CloudWatch and AWS S3 Deployed application using AWS EC2 standard deployment techniques and worked on AWS infrastructure and automation. Worked on CI/CD environment on deploying application on Docker containers. Used AWS S3 Buckets to store the file and injected the files into Snowflake tables using Snow Pipe and run deltas using Data pipelines. Installed and configured apache airflow for workflow management and created workflows in python. Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool. Optimizing existing algorithms in Hadoop using Spark
  • Context, Spark-SQL, Data Frames and Pair RDD. Used Spark Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real-time and persists into Cassandra. Implementing the strategy to migrate Netezza based analytical systems to Snowflake on AWS. Worked with Architect on final approach and streamlined the integration - Informatica with Snowflake. Created various reports using Tableau based on requirements with the BI team.Environment:Spark, Hive, Scala, AWS cloud platform, CI/CD

Environment:, SQL, Kafka, Python, Tableau, Cassandra, EMR, Hive, Spark, Lambda, Scala, DynamoDBStreams, Amazon KinesisFirehose, Redshift, S3, Informatica, Snowflake, Hadoop, Yarn.

Confidential

Hadoop Developer

Responsibilities:

  • Utilized Sqoop, Kafka, Flume and Hadoop File System API's for implementing data ingestion pipelines. Worked on real time streaming, performed transformations on the data using Kafka and Spark Streaming. Created storage with Amazon S3 for storing data. Worked on transferring data from Kafka topic into
  • AWS S3 storage. Created Hive tables, loaded with data, and wrote Hive queries to process the data. Created Partitions and used Bucketing on Hive tables and used required parameters to improve performance and developed Hive UDFs as per business use - cases. Developed Hive scripts for source data validation and transformation. Automated data loading into HDFS and Hive for pre-processing the data using Oozie. Collaborated in data modeling, data mining, Machine Learning methodologies, advanced data processing, ETL optimization. Worked on various data formats like Avro, Sequence File, JSON,
  • Map File, Parquet, and XML. Worked extensively on AWS Components such as Airflow, Elastic Map Reduce (EMR), Athena, Snowflake. Used Apache NiFi to automate data movement between different Hadoop components. Used NiFi to perform conversion of raw XML data into JSON, Avro. Experienced in working with Hadoop from Cloudera Data Platform and running services through Cloudera manager. Assisted in Hadoop administration and support activities for installations and configuring Apache Big Data Tools and Hadoop clusters using Cloudera Manager. Experienced in Hadoop Production support tasks by analysing the Application and cluster logs. Used Agile Scrum methodology/ Scrum Alliance for development.

Environment: Hadoop, HDFS, AWS, Vertica, Scala, Kafka, MapReduce, YARN, Drill, Spark, Pig, Hive, Scala, Java, NiFi, HBase, MySQL, Kerberos, Maven.

Confidential

Big Data Developer

Responsibilities:

  • Involved in building scalable distributed data lake system for Confidential real time and batch analytical needs. Involved in designing, reviewing, optimizing data transformation processes using Apache Storm. Experience in job management using Fair Scheduling and Developed job processing scripts using Control - M workflow. Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive. Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Scoop. Experienced in
  • Performing tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning. Loaded the data into Spark RDD and do in memory data computation to generate the output response. Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's. Performed advanced procedures like text analytics and processing, using the in-memory computing capacities of Spark using Scala. Imported data from Kafka Consumer into HBase using Spark streaming. Experienced in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows. Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific jobs. Experienced in handling large datasets using partitions, Spark in
  • Memory capabilities, Broadcasts in Spark, Effective efficient Joins, Transformation and other during ingestion process itself. Worked on migrating legacy Map Reduce programs into Spark transformations using Spark and Scala. Worked on a POC to compare processing time for Impala with Apache Hive for batch applications to implement the former in project. Worked extensively with Sqoop for importing metadata from Oracle.

Environment: Apache Storm, Spark API, Hadoop, Scala, kafka, Zookeeper, MapReduce, Hive, Sqoop, HBase,Impala,Oozie,Oracle,Yarn, text analytics

Confidential

Data Analyst/Modeler

Responsibilities:

  • Performed as a Data Analysis, Data Modeling, Data Migration and data profiling using complex SQL on various sources systems including Oracle and Teradata. Experienced in building applications based on large datasets in MarkLogic. Translated business requirements into working logical and physical data models for Data warehouse, Data marts and OLAP applications. Analysed data lineage processes to identify vulnerable data points, control gaps, data quality issues, and overall lack of data governance. Worked on data cleansing and standardization using the cleanse functions in Informatica MDM. Designed
  • Star and Snowflake Data Models for Enterprise Data Warehouse using ERWIN. Validated and updated the appropriate LDM's to process mappings, screen designs, use cases, business object model, and system object model as they evolve and change. Maintained data model and synchronized it with the changes to the database. Designed and developed use cases, activity diagrams, and sequence diagrams using UML. Extensively involved in the modeling and development of Reporting Data Warehousing System. Designed the database tables created table and column level constraints using the suggested naming conventions for constraint keys. Implemented enterprise grade platform (Mark logic) for ETL from mainframe to NOSQL (cassandra).
  • Used ETL tool BO DS to extract, transform and load data into data warehouses from various sources like relational databases, application systems, temp tables, flat files etc. Wrote packages, procedures, functions, exceptions using PL/SQL. Reviewed the database programming for triggers, exceptions, functions, packages, procedures.

Environment: MarkLogic, OLAP, Oracle, Teradata, ERWIN, ETL, NoSQL, Star, Snowflake data models, PL/SQL

We'd love your feedback!