We provide IT Staff Augmentation Services!

Data Engineer Resume

0/5 (Submit Your Rating)

Columbus, OhiO

SUMMARY:

  • Highly motivated and results - driven Analyst with a proven track record in data science and analytics platform
  • Experienced in applied mathematics, linear algebra and probability statistics.
  • Experience with Python, Pyspark for Cloudera distributions and data science projects
  • Good knowledge in the software and programming with emphasis on numerical algorithms, statistical modelling, mathematical theory, numerical libraries and scientific computing.
  • Excellent communication and interpersonal skills.

TECHNICAL SKILLS:

Platforms: Windows 7,8 10/Ubuntu

Products: Anaconda, R Studio, Tableau Desktop, Jupyter, MatLab, Cloudera

Databases: SQL, Hive

Programming Languages: C, Python, R, Java

Software: R Studio, Spyder, Microsoft SQL server, Tableau Desktop, Java Platform, Turbo C, R Shiny, Matlab and Simulink, MiniTab 18, XLMiner, Sqoop, HDFS,Hive,Spark,Flume

PROFESSIONAL EXPERIENCE:

Confidential, Columbus, Ohio

Data Engineer

Technologies Leveraged: Cluster Size - 8 Node Cluster with AtScale on the Edge node on cloud, Cloudera CDH5.15, AtScale 7.2, Tableau 2018, Hive, Sqoop, Impala, Spark

Responsibilities:

  • Installation and setup of multi node Cloudera cluster on AWS cloud
  • Installation and setup of AtScale on top of Hadoop cluster using Hive and Impala as the SQL engines
  • Development of cubes involving multiple facts and dimensions
  • Development of calculations, leveraging Query Data sets
  • Defining & managing Aggregates

Confidential

Technologies Leveraged: Hadoop, Cloudera 5.15, Spark 2.1, HDFS, Python, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, Java, Unix

Responsibilities:

  • Developed a mapping of Data ingestion type to the program mapping (Sqoop, Pig or Kafka)
  • Developed Ingestion capability using Sqoop, Kafka and pig. Leveraged spark for data processing and transformation
  • Developed the real-time / near real-time framework using Kafka and Flume capabilities
  • Developed framework to decide on data formats like Parquet, AVRO, ORC etc.
  • Developed Spark code using Python and Spark-SQL for faster processing and testing.
  • Worked on Spark SQL for joining multiple hive tables and write them to a final hive table and stored them on S3.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Performed querying of both managed and external tables created by Hive.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.

Confidential

Technologies Leveraged: Hadoop, Cloudera, Spark, HBase, HDFS, Python, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, Tableau.

Responsibilities:

  • Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of Spark.
  • Developed Spark code using Python and Spark-SQL for faster processing and testing.
  • Worked on Spark SQL for joining multi hive tables and write them to a final hive table and stored them on S3.
  • Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
  • Created Spark jobs to do lighting speed analytics over the spark cluster.
  • Extracted files from Teradata through Sqoop and placed in HDFS and processed.
  • Responsible to store processed data into HBase.
  • Performed querying of both managed and external tables created by Hive.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Fetch and generate monthly reports, Visualization of those reports using Tableau.

Confidential

Junior Data Scientist

Responsibilities:

  • To create a CIBIL type score for each retailer under the company to decide their worthiness.
  • Three years of raw unstructured data was preprocessed into a structured format by me
  • Using multiple years of data and different factors I created a score for each retailer and built a naïve bayes model on the retailer’s worthiness for the upcoming years.

Environment: MS Office, R Studio, Tableau Desktop, R Shiny, Python 3.2.7

Confidential

Responsibilities:

  • To create a prediction models on whether a candidate would join a company or not.
  • Using multivariate analysis I saw how each factor was affecting the end result and by using multilinear regression, I found out the weightage of each factor.
  • Like multilinear regression, I have created the models based on decision tree and k-nn and came to a conclusion that decision was the best fit model for this data.

Environment: R Studio, R Shiny, MS Office, Tableau Desktop, Python3.2.7

Confidential

Junior Data Scientist

Responsibilities:

  • I created an algorithm using the tensor- flow module in python where the camera would identify the contours of the hand and based on that identify what kind of gesture they’re giving.
  • Now to automate this process I had to train the algorithm by showing it more than 3000 images so that it can learn what kind of gesture we’re giving
  • We then integrated this algorithm in raspberry pi so it can be used as per our requirement

Environment: Python 3.2.7, OpenCV, TensorFlow, Pandas, Raspberry Pi 3

Confidential

Junior Data Scientist

Responsibilities:

  • I had to create an algorithm where it would extract the email, address, skills, experience, names
  • This algorithm was made based on the python libraries textract and nltk
  • The ranking was created based on there skills and years of experience, the ranking was a score of 0-100 giving p to skills over experien

Environment: Python 3.2.7, textract, Nltk, docx2txt, Microsoft Office

We'd love your feedback!