We provide IT Staff Augmentation Services!

Data Engineer Resume

4.00/5 (Submit Your Rating)

SUMMARY

  • Around 7 years of professional experience as a Data Engineer in designing data - intensive applications using Hadoop Ecosystem, Big Data Analytical, Cloud Data Engineering, Data Warehousing/Data Mart or OLAP Cubes, Data Visualization, Reporting, SQL
  • Knowledge of handling Structured(tabular) as well as Unstructured Datasets (NoSQL)
  • Proficient experience in Spark with Python, Sqoop, Hive, Kafka
  • Profound Knowledge in writing SQL queries and Shell Scripts
  • Practical understanding of teh Data modeling (Dimensional & Relational) concepts like ER modeling, Star-Schema Modeling, Snowflake Schema Modeling, Fact, and Dimensional Modeling
  • Extensive AWS experience including AWS services such as EMR, EC2, IAM, Lambda, S3
  • Hands-on experience in Bash scripting and building data pipelines.
  • Strong Knowledge in data preparation, data modeling, and data visualization using Power Bi and Tableau and developed reports and Dashboards in them as well.
  • Experience in using Sqoop for importing and exporting data from relational database management systems to HIVE and HDFS
  • Knowledge of various file formats in HDFS like Avro, orc, parquet
  • Excellent Communication and interpersonal skills and keen on shifting to new technologies
  • Experience with partitioning and bucketing concepts in Hive and designed as well as managed External tables in Hive to optimize performance
  • Extensive Knowledge in using Python Packages like Pandas, Numpy, Matplotlib, and sklearn to derive relationships in teh data and using concepts of SQL, Hive SQL, and Pyspark mlib hand in hand with teh above to achieve reduced processing times for teh reports to be generated
  • Building and productionizing predictive models on large datasets by utilizing advanced statistical modeling, machine learning, or other data mining techniques.
  • Hands-on experience with different programming languages like Python, SAS, C++, Java, C
  • In-depth noledge of Hadoop architecture and its components like YARN, HDFS, Name Node, Data Node, Job Tracker, Application Master, Resource Manager, Task Tracker, and Map Reduce programming paradigm.
  • Strong Knowledge of Architecture of Distributed Systems and Parallel processing, In-depth understanding of MapReduce programming paradigm and Spark execution framework.
  • Expert in designing Parallel jobs using various stages like Join, Merge, Lookup, remove duplicates, Filter, Dataset, Lookup file set, Complex flat file, Modify, Aggregator, XML.
  • Experienced in Scraping data from teh web by writing python scripts with beautiful soup and selenium web driver for dynamic webpage scraping.
  • Experienced with teh Spark improving teh performance and optimization of teh existing algorithms in Hadoop using Spark Context, Spark-SQL, Data frame API, Spark Streaming, Pair RDDs, and worked explicitly on PySpark and Scala.

TECHNICAL SKILLS

Programming: Python, R, SAS, Java, Shell, C, C++

Big Data Technologies: Hadoop, MapReduce, HDFS, Spark, Hive, Sqoop, Kafka, Mahout, Sparklib

Databases: MySQL, Oracle, SQL Server, MongoDB, Casandra, Postgres SQL

Cloud Technologies: Google BigQuery, AWS (S3, EC2), Microsoft Azure

Reporting: Power BI, Tableau

Tools: PyCharm, Eclipse, SQL Server Management Studio, Spyder, Access, Erwin

Operating Systems: Windows 7/8/XP/2011/Vista, Ubuntu, MacOs

We'd love your feedback!