We provide IT Staff Augmentation Services!

Data Engineer (bigdata) Resume

SUMMARY

  • A technologist well versed in teh usage of Tableau, Power BI and ETL processes in Python and Talend.
  • Capable of working with deep neural networks and other machine learning models.
  • Fluent in teh use of cloud services in Amazon Web Services for IT automation and data warehousing.

TECHNICAL SKILLS

Databases: Postgreql, SQL Server, MySQL

ETL Tools: Talend, Hadoop, RDS, S3, Redshift, Lambda, Kinesis, Big Query

Reporting Tools: Tableau, Power BI, QlikSense

Languages: Python, Java SQL, Scala, R, JavaScript, HTML

Operating Systems: Ubuntu, Amazon AMI Linux

Other Tools: Tableau, D3.js, Spark, Hive, Highcharts, QlikSense, Datalab, MapBox, Kafka, Airflow

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer (Bigdata)

Responsibilities:

  • Develop technical architectures, designs, and processes to extract, cleanse, integrate, organize and present data from a variety of sources and formats for analysis and use across use cases.
  • Perform data profiling, discovery, and analysis to identify/determine location, suitability and coverage of data, and identify teh various data types, formats, and data quality which exist within a given data source.
  • Work with source system and business SME's to develop an understanding of teh data requirements and options available within customer sources to meet teh data and business requirements.
  • Perform hands on data development to accomplish teh data extraction, movement and integration, leveraging state of teh art tools and practices, including both streaming and batched data ingestion techniques.
  • Used Scala to build data processing work flows in Apache Spark.
  • Used Scala spark to develop streaming data pipeline for spark.
  • Used Hadoop, Python, AWS lambda, PostgreSQL, to maintain teh backend data systems.
  • Worked in HDFS and Apache Spark to transform and integrate intellectual property case files.
  • Worked in Spark Dataframe to clean casefile data
  • Parse XML files into dataframes for ingestion into casefile database.
  • Built and maintain data pipelines for ingesting data from USPTO and CIPO trademark databases
  • Built NLP codebase that tags trademarks photos and parses teh trademark description
  • Train deep learning algorithm for detection of trademark infringement used jenkins and ansible to manager production servers.
  • Analyzed Big data sets and constructed interactive User Dashboards in Tableau & Hadoop

Confidential

Business Intelligence/Data Engineer

Responsibilities:

  • Used HDFS, Apache spark to warehouse donor and social media data.
  • Used scala spark to develop batch data warehouse for transitional data.
  • Design Tableau dashboards for web - based platforms and internal reporting.
  • Preparing dashboards using calculations, parameters, calculated hierarchies in Tableau.
  • Used table level calculations, drill down and level of detail calculation to perform advanced analytics and visualizations
  • Used Hadoop, Python, AWS lambda, PostgreSQL, to maintain teh data systems.
  • Use Talend to Build Spark ETL pipelines to import and transform large structured and unstructured data from internal databases and APIs using Talend.
  • Designed automated preprocess and ingestion workflow including historical data extraction and ongoing files from multiple sources with various formats such as, flat fixed length, flat delimited, Json and XML.
  • Use Talend to build ETL workflows to clean, reshape and transform data-sets
  • Used ETL tools to maintain data integrity and quality. used QGIS, Mapbox to perform geomatic data analytics
  • Used SciKitlearn and for statistical Analysis test between groups, correlations and projections.
  • Use Hive to Query data in HDFS.

Confidential

Business Intelligence Developer/Data Engineer

Responsibilities:

  • Used Spark ETL tools to integrate, cleanse and join data-sets
  • Worked with a team of two other researchers to devise methods on data profiling and representation
  • Designed and extracted taxonomy of categorical data for survey statistical design
  • Designed workflows in Tableau
  • Provided technical and cartographic services to research team

Environment: Tableau, Hadoop

Confidential

Data Engineer

Responsibilities:

  • Use Python to write data ingestion scripts from data vendor API.
  • Designed automated preprocess and ingestion workflow including historical data extraction and ongoing files from multiple sources with various formats such as, flat fixed length, flat delimited, Json and XML.
  • Transformed and loaded traffic time series data to EDL. Using Pandas, scikit-learn library, python, performed extraction, exploration, visualization and transformation of features.
  • Built and maintained full stack systems on AWS for displaying graphics on a web-based platform
  • Designed, built and maintained data pipelines and systems for data ingestion from APIs
  • Developed and maintained front-end web pages to render graphical data
  • Utilized AWS, RDS, and EC2 to host teh data and Highcharts as teh front-end visualization interface to display teh data

Environment: AWS, RDS, EC2, Highcharts, Hadoop.

Confidential

Open Data analyst

Responsibilities:

  • Wrote Pyspark jobs to ingest and process data in datawarehouse
  • Used tableau to produce custom maps based on census data.
  • Used Hadoop, Python, PostgreSQL, to maintain teh backend data systems.
  • Created custom tabulations views of data using tableau and other ETL tools.
  • Segmented data-sets for use on teh Namara platform

Confidential

Intern/Data egnineer

Responsibilities:

  • Worked on Apache Scala Spark data processing pipeline in Bluemix. connected databases and other sources to Tableau worked on tableau dashboards for student profile data.
  • Produced prototypes projects for internship teams.
  • Worked on Apache spark data processing pipeline in Bluemix.
  • Prepared projects on teh IBM Bluemix platform

Hire Now