Data Engineer Resume Columbus, Ohio - Hire IT People

SUMMARY:

Highly motivated and results - driven Analyst with a proven track record in data science and analytics platform
Experienced in applied mathematics, linear algebra and probability statistics.
Experience with Python, Pyspark for Cloudera distributions and data science projects
Good knowledge in the software and programming with emphasis on numerical algorithms, statistical modelling, mathematical theory, numerical libraries and scientific computing.
Excellent communication and interpersonal skills.

TECHNICAL SKILLS:

Platforms: Windows 7,8 10/Ubuntu

Products: Anaconda, R Studio, Tableau Desktop, Jupyter, MatLab, Cloudera

Databases: SQL, Hive

Programming Languages: C, Python, R, Java

Software: R Studio, Spyder, Microsoft SQL server, Tableau Desktop, Java Platform, Turbo C, R Shiny, Matlab and Simulink, MiniTab 18, XLMiner, Sqoop, HDFS,Hive,Spark,Flume

PROFESSIONAL EXPERIENCE:

Confidential, Columbus, Ohio

Data Engineer

Technologies Leveraged: Cluster Size - 8 Node Cluster with AtScale on the Edge node on cloud, Cloudera CDH5.15, AtScale 7.2, Tableau 2018, Hive, Sqoop, Impala, Spark

Responsibilities:

Installation and setup of multi node Cloudera cluster on AWS cloud
Installation and setup of AtScale on top of Hadoop cluster using Hive and Impala as the SQL engines
Development of cubes involving multiple facts and dimensions
Development of calculations, leveraging Query Data sets
Defining & managing Aggregates

Confidential

Technologies Leveraged: Hadoop, Cloudera 5.15, Spark 2.1, HDFS, Python, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, Java, Unix

Responsibilities:

Developed a mapping of Data ingestion type to the program mapping (Sqoop, Pig or Kafka)
Developed Ingestion capability using Sqoop, Kafka and pig. Leveraged spark for data processing and transformation
Developed the real-time / near real-time framework using Kafka and Flume capabilities
Developed framework to decide on data formats like Parquet, AVRO, ORC etc.
Developed Spark code using Python and Spark-SQL for faster processing and testing.
Worked on Spark SQL for joining multiple hive tables and write them to a final hive table and stored them on S3.
Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
Performed querying of both managed and external tables created by Hive.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.

Confidential

Technologies Leveraged: Hadoop, Cloudera, Spark, HBase, HDFS, Python, MapReduce, YARN, Hive, Pig, Sqoop, Oozie, Tableau.

Responsibilities:

Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of Spark.
Developed Spark code using Python and Spark-SQL for faster processing and testing.
Worked on Spark SQL for joining multi hive tables and write them to a final hive table and stored them on S3.
Implemented Spark RDD transformations to Map business analysis and apply actions on top of transformations.
Created Spark jobs to do lighting speed analytics over the spark cluster.
Extracted files from Teradata through Sqoop and placed in HDFS and processed.
Responsible to store processed data into HBase.
Performed querying of both managed and external tables created by Hive.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Fetch and generate monthly reports, Visualization of those reports using Tableau.

Confidential

Junior Data Scientist

Responsibilities:

To create a CIBIL type score for each retailer under the company to decide their worthiness.
Three years of raw unstructured data was preprocessed into a structured format by me
Using multiple years of data and different factors I created a score for each retailer and built a naïve bayes model on the retailer’s worthiness for the upcoming years.

Environment: MS Office, R Studio, Tableau Desktop, R Shiny, Python 3.2.7

Confidential

Responsibilities:

To create a prediction models on whether a candidate would join a company or not.
Using multivariate analysis I saw how each factor was affecting the end result and by using multilinear regression, I found out the weightage of each factor.
Like multilinear regression, I have created the models based on decision tree and k-nn and came to a conclusion that decision was the best fit model for this data.

Environment: R Studio, R Shiny, MS Office, Tableau Desktop, Python3.2.7

Confidential

Junior Data Scientist

Responsibilities:

I created an algorithm using the tensor- flow module in python where the camera would identify the contours of the hand and based on that identify what kind of gesture they’re giving.
Now to automate this process I had to train the algorithm by showing it more than 3000 images so that it can learn what kind of gesture we’re giving
We then integrated this algorithm in raspberry pi so it can be used as per our requirement

Environment: Python 3.2.7, OpenCV, TensorFlow, Pandas, Raspberry Pi 3

Confidential

Junior Data Scientist

Responsibilities:

I had to create an algorithm where it would extract the email, address, skills, experience, names
This algorithm was made based on the python libraries textract and nltk
The ranking was created based on there skills and years of experience, the ranking was a score of 0-100 giving p to skills over experien

Environment: Python 3.2.7, textract, Nltk, docx2txt, Microsoft Office

We provide IT Staff Augmentation Services!

Data Engineer Resume

Columbus, OhiO

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship