We provide IT Staff Augmentation Services!

Bigdata Engineer Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • Iam a Big data Engineer/Data Analyst enthusiast and an AWS certified Cloud Practitioner. Inspired by the quote “Trust me I never lose I either win or learn ”.
  • Highly qualified professional withabout4 years of experience in building Big data applications, creating datalakes to manage structured and unstructured data.
  • Strong Experience in Big Data technologies (Spark 1.6, Spark SQL, PySpark, Hadoop, HDFS, Hive, Sqoop, Kafka, Spark Streaming).
  • Experienced in writing and optimizing in diverse SQL queries with good knowledge in RDBMS MySQL, Hive, RDS.
  • Expertise and well versed with various Ab Initio Transform, Partition, Departition, Dataset and Database components.
  • Good Knowledge in making and keeping up profoundly versatile and fault tolerant Infrastructure in AWS environment spanning over different availability zones.
  • Experienced in python to manipulate data for data loading and extraction and worked with python libraries for data analysis.
  • Excellent understanding inAgile and Scrum development methodology.
  • Passionate about gleaning insightful information from massive datasets and developing a culture of sound, data - driven decision making.
  • Iam a good team player who likes to take initiative and seek out new challenges.
  • Proficient in Tableau to analyze and obtain insights into large datasets, create visually powerful and actionable interactive reports and dashboards.
  • Excellent communicationskills, can work in fast-paced multitasking environment both independently and in a collaborative team, a self-motivated enthusiastic learner.
  • Expertise in ETL process using python, RDS, datalake to extract, transform and load large amount of data and implementing it on AWS for data warehousing and data migration purposes.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, Spark, PySpark, Kafka, Spark Streaming, Zookeeper,Hive, HDFS, MapReduce, SqoopLanguages: Python(Pandas,NumPy,Boto3,Matplotlib), SQL, HTML, CSSAWS Services: EC2, ELB, EMR, Redshift, RDS, IAM, S3,AWS Lambda, EBS, EFS, CloudWatch, ECS, SNS, SQS

Databases: MySQL, Hive, SparkSQL, HQL, Redshift, RDS, Dynamo DB

Reporting Tools/ BI tools: Tableau,Power BI, MS Office

Data Modelling Tools: MS Visio, Lucid Chart, Relational Rose

Other tools: JIRA, GitHub

Operating Systems: Windows, Linux, CentOS7, Macintosh HD, Android, Unix

PROFESSIONAL EXPERIENCE

Confidential

BigData Engineer

Responsibilities:

  • Worked on design and implementation of serverless data pipelines for high performance data integration processes, database, storage, and other back-end services in fully virtualized environments on Samsung Data and stored the processed data in Hive.
  • Worked on implementing pipelines and analytical workloads using big data technologies such as Hadoop, Spark, Hive and HDFS.
  • Handled importing data from AWS S3 to HDFS, performed transformation and action functions using Spark to get the desired output.
  • Played a key role in BuildingData Catalogue Managertool to identify the personal information and non-personal Information columns of data stored in Vertica, On-premise Databases(USAS) and Hive.
  • AWS wrote cloud based serverless pipelines to import data from Vertica and Hive into MySQL
  • Designed and built data lake storage and ETL process for device insights streaming data using Kafka and Spark Streaming
  • Handled loading clickstream data from AWS S3 to Hive using crontab and shell scripting.
  • Created data catalog tables that provide an overview of where data originated and where it was sent.
  • Closely worked with Kafka Admin team to set up Kafka cluster and implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
  • Build pipeline using Spark Streaming to receive real time data from the Apache Kafka and store the stream data to HDFS.
  • Used Kafka HDFS connector to export data from Kafka topic to HDFS files in a variety of formats and integrates with apache hive to make data immediately available for HQL querying.
  • Implemented python scripts for backend database connectivity and data imports.
  • Built a serverless ETL in AWS lambda to process the files that are new in the S3 bucket to be cataloged immediately.
  • Built python module to access Jira and create issues for the all the DB owners and notify them every 7 days if the issue is not closed.
  • Used AWSSQS to send the processed data further to the next working teams for further processing.
  • Deployed AWSLambda functions to sync data from MySQL to the Client Portal.
  • Participated in daily scrum meetings.
  • Developed Tableau visualizations and dashboards using Tableau Desktop.

Environment: Hadoop, Spark, Spark SQL, Hive, HQL, MySQL, HDFS, Shell Scripting, Crontab, Apache Kafka, Apache Spark Streaming, AWS RDS, Python, AWS Lambda, AWS EC2, JIRA, Tableau

Confidential, Arkansas

Data Analyst

Responsibilities:

  • Scheduled an ETL process through S3 Event Trigger to load the data from S3 to tables in AWS Redshift using AWSlambda.
  • Utilizing SQL queries, python libraries, MS Excel to filter and clean the data.
  • Responsible for developing customized reports and dashboards using Tableau Public.
  • Developed new reports and enhance existing reports; connecting to single or multiple data sources that includes AWS RDS, AWS Redshift, Excel, CSV files.
  • Converted data into actionable insights by predicting and modelling future outcomes.
  • Designed Tableau data visualization utilizing cross-tabs, Maps, Scatter Plots, Pie, Bar and Density charts.
  • Automated report generation and data ingestion processes from sources like SQL, Excel, Text and Redshift

Environment: MySQL, AWS RDS, Tableau Public, Python(Pandas, Numpy, Matplotlib, Boto3), AWS S3

Confidential

Data Engineer

Responsibilities:

  • Worked with Data Scientists to do Data Cleaning, Data Aggregation, Data Transformations in Apache Spark, Python(NumPy, Pandas), SQL
  • Created python jobs in Apache Airflow to schedule daily data aggregations from RDS to Amazon Redshift
  • Experience with AWS services like EC2, Load Balancing, S3, Redshift and Lambda functions.
  • Worked on implementing relational database systems, using cloud-based database technologies
  • Worked with multi-terabyte datasets using relational databases (RDBMS) and SQL, data cleaning and creating performance metrics in Hive
  • Collecting and processing raw data at scale including writing scripts, calling APIs, write SQL queries etc
  • Experience in implementing data analysis with various analytic tools, such as AnacondaJupyterNotebook etc
  • Explored and Extracted data from source XML in HDFS, used ETL for preparing data for exploratory analysis using data munging
  • Responsible for ETL processes and frameworks for analytics and data management
  • Involved in the design, prototyping and delivery of software solutions within the big data eco-system

Environment: Apache Spark, Apache Airflow, SQL, Hive, HDFS, Jupyter Notebook, Python(NumPy, Pandas), AWS EC2, AWS S3, AWS Redshift, AWS Lambda, AWS RDS

We'd love your feedback!