Data Engineer (bigdata) Resume
SUMMARY
- A technologist well versed in teh usage of Tableau, Power BI and ETL processes in Python and Talend.
- Capable of working with deep neural networks and other machine learning models.
- Fluent in teh use of cloud services in Amazon Web Services for IT automation and data warehousing.
TECHNICAL SKILLS
Databases: Postgreql, SQL Server, MySQL
ETL Tools: Talend, Hadoop, RDS, S3, Redshift, Lambda, Kinesis, Big Query
Reporting Tools: Tableau, Power BI, QlikSense
Languages: Python, Java SQL, Scala, R, JavaScript, HTML
Operating Systems: Ubuntu, Amazon AMI Linux
Other Tools: Tableau, D3.js, Spark, Hive, Highcharts, QlikSense, Datalab, MapBox, Kafka, Airflow
PROFESSIONAL EXPERIENCE
Confidential
Data Engineer (Bigdata)
Responsibilities:
- Develop technical architectures, designs, and processes to extract, cleanse, integrate, organize and present data from a variety of sources and formats for analysis and use across use cases.
- Perform data profiling, discovery, and analysis to identify/determine location, suitability and coverage of data, and identify teh various data types, formats, and data quality which exist within a given data source.
- Work with source system and business SME's to develop an understanding of teh data requirements and options available within customer sources to meet teh data and business requirements.
- Perform hands on data development to accomplish teh data extraction, movement and integration, leveraging state of teh art tools and practices, including both streaming and batched data ingestion techniques.
- Used Scala to build data processing work flows in Apache Spark.
- Used Scala spark to develop streaming data pipeline for spark.
- Used Hadoop, Python, AWS lambda, PostgreSQL, to maintain teh backend data systems.
- Worked in HDFS and Apache Spark to transform and integrate intellectual property case files.
- Worked in Spark Dataframe to clean casefile data
- Parse XML files into dataframes for ingestion into casefile database.
- Built and maintain data pipelines for ingesting data from USPTO and CIPO trademark databases
- Built NLP codebase that tags trademarks photos and parses teh trademark description
- Train deep learning algorithm for detection of trademark infringement used jenkins and ansible to manager production servers.
- Analyzed Big data sets and constructed interactive User Dashboards in Tableau & Hadoop
Confidential
Business Intelligence/Data Engineer
Responsibilities:
- Used HDFS, Apache spark to warehouse donor and social media data.
- Used scala spark to develop batch data warehouse for transitional data.
- Design Tableau dashboards for web - based platforms and internal reporting.
- Preparing dashboards using calculations, parameters, calculated hierarchies in Tableau.
- Used table level calculations, drill down and level of detail calculation to perform advanced analytics and visualizations
- Used Hadoop, Python, AWS lambda, PostgreSQL, to maintain teh data systems.
- Use Talend to Build Spark ETL pipelines to import and transform large structured and unstructured data from internal databases and APIs using Talend.
- Designed automated preprocess and ingestion workflow including historical data extraction and ongoing files from multiple sources with various formats such as, flat fixed length, flat delimited, Json and XML.
- Use Talend to build ETL workflows to clean, reshape and transform data-sets
- Used ETL tools to maintain data integrity and quality. used QGIS, Mapbox to perform geomatic data analytics
- Used SciKitlearn and for statistical Analysis test between groups, correlations and projections.
- Use Hive to Query data in HDFS.
Confidential
Business Intelligence Developer/Data Engineer
Responsibilities:
- Used Spark ETL tools to integrate, cleanse and join data-sets
- Worked with a team of two other researchers to devise methods on data profiling and representation
- Designed and extracted taxonomy of categorical data for survey statistical design
- Designed workflows in Tableau
- Provided technical and cartographic services to research team
Environment: Tableau, Hadoop
Confidential
Data Engineer
Responsibilities:
- Use Python to write data ingestion scripts from data vendor API.
- Designed automated preprocess and ingestion workflow including historical data extraction and ongoing files from multiple sources with various formats such as, flat fixed length, flat delimited, Json and XML.
- Transformed and loaded traffic time series data to EDL. Using Pandas, scikit-learn library, python, performed extraction, exploration, visualization and transformation of features.
- Built and maintained full stack systems on AWS for displaying graphics on a web-based platform
- Designed, built and maintained data pipelines and systems for data ingestion from APIs
- Developed and maintained front-end web pages to render graphical data
- Utilized AWS, RDS, and EC2 to host teh data and Highcharts as teh front-end visualization interface to display teh data
Environment: AWS, RDS, EC2, Highcharts, Hadoop.
Confidential
Open Data analyst
Responsibilities:
- Wrote Pyspark jobs to ingest and process data in datawarehouse
- Used tableau to produce custom maps based on census data.
- Used Hadoop, Python, PostgreSQL, to maintain teh backend data systems.
- Created custom tabulations views of data using tableau and other ETL tools.
- Segmented data-sets for use on teh Namara platform
Confidential
Intern/Data egnineer
Responsibilities:
- Worked on Apache Scala Spark data processing pipeline in Bluemix. connected databases and other sources to Tableau worked on tableau dashboards for student profile data.
- Produced prototypes projects for internship teams.
- Worked on Apache spark data processing pipeline in Bluemix.
- Prepared projects on teh IBM Bluemix platform