We provide IT Staff Augmentation Services!

Data Engineer (telemetry) Resume

4.00/5 (Submit Your Rating)

San, JosE

SUMMARY:

  • Strong expertise on Hadoop big data, SQL and HQL. Worked on Kafka p.o.c for end - to-end implementation to receive real time data flow from AWS server.
  • Experience in data visualization, data analytics, data integration, data quality using Python. Used MATPLOT, DJANGO libraries in Python to show data visualization and trends.
  • Used NLTK libraries to detect the customer satisfaction through their verbatim comments. (Tokenization and standardization are part of this process).
  • Querying on large data sets to process and refine the data for down streams. Automated the audit system to recognize any errors on fly (Null percentage and validity).
  • Expertise in RDBMS, Agile, Scrum and waterfall methodologies.
  • CSS, HTML and JavaScript along with good knowledge of backend technologies including Python. Tableau expertise with 12 months ofexperience.
  • Strong understanding of project life cycle and Software Development Life Cycle (SDLC). WordPress, MS office tools, CMS and multiple platforms Windows andMac. Strong command on MS Excelwith VLOOKUP, MATCH and INDEXfunctions.
  • Good industry knowledge, analyticaland problem solving skills and ability to work well with in a team as well as an individual. Expertise in transforming business requirements into analytical models, designingalgorithms, buildingmodels, developing data mining and reporting solutions that scales across massive volume of structured and unstructured data.
  • Hands on experience with big data tools like Hadoop, Spark, Hive, Pig, Impala, Pyspark, SparkSql.
  • Good Knowledge in Proof of Concepts (PoC's), gap analysis and gathered necessary data for analysis from different sources, prepared data for data exploration using data munging.
  • Highly creative, innovative, committed, intellectually curious, business savvy with good communication and interpersonal skills.
  • Experience in using various packages in Rand python like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitteR, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2.
  • Extensive experience in Data Visualization including producing tables, graphs, listings using various procedures and tools such as Tableau.

TECHNICAL SKILLS:

Programming & Scripting Languages: R, C, C++, JAVA, JCL, COBOL, HTML, CSS, JSP, Java Script

Databases: SQL Server 2014/2012/2008/2005/2000 , MS-Access, Oracle 12c/11g/10g/9i and Teradata, big data, Hadoop

Statistical Software: SPSS, R, SAS.

Web Packages: Google Analytics, Adobe Test & Target, Web Trends

Bigdata Ecosystem: HDFS, PIG, MapReduce, HIVE, SQOOP, FLUME, HBase, Storm, Kafka, Elastic Search, Redis, Flume, Storm, Kafka, Elastic Search, Redis, Flume, Scoop.

Statistical Methods: Time Series, regression models, splines, confidence intervals, principal component analysis and Dimensionality Reduction, bootstrapping

BI Tools: Microsoft Power BI, Tableau, SSIS, SSRS, SSAS, Business Intelligence Development Studio (BIDS), Visual Studio, Crystal Reports, Informatica 6.1.

Database Design Tools and Data Modeling: MS Visio, ERWIN 4.5/4.0, Star Schema/Snowflake Schema modeling, Fact & Dimensions tables, physical & logical data modeling, Normalization and De-normalization techniques, Kimball &Inmon Methodologies

Cloud: AWS, S3, EC2.

Big Data / Grid Technologies: Cassandra, Coherence, Mongo DB, Zookeeper, Titan, Elasticsearch, Storm, Kafka, Hadoop

Tools: and Utilities SQL Server Management Studio, SQL Server Enterprise Manager, SQL Server Profiler, Import & Export Wizard, Visual Studio.Net, Microsoft Management Console, Visual Source Safe 6.0, DTS, Crystal Reports, Power Pivot, ProClarity, Microsoft Office, Excel Power Pivot, Excel Data Explorer, Tableau, JIRA,Spark MLlib.

PROFESSIONAL EXPERIENCE:

Confidential, San jose

Data Engineer (Telemetry)

Responsibilities:

  • As a Data engineer worked on huge raw data sets (Telemetry data).
  • Data classification and text analysis using NLTK libraries in Python. Collecting and compiling the various datasets from various data generating sources (AWS, MYSQL servers, Oracle Database).
  • Ingesting the data by creating tables in Hive and achieving high-level optimization of the ingested data by altering and refining through HQL.
  • Resolved compatibility issues by working on different areas of data architecture including data ingestion and pipeline design. Worked on Machine learning and advanced data processing to see the customer trends (scrutinize) in cumulative.
  • Developed and implemented scripts using machine-learning algorithms like Naïve Bayes Classifier Algorithm, k means clustering, Random Forest classifier for regression, clustering etc.
  • Created many plots, bar charts, graphs, histograms of the complex data by visualizing the data using matplotlib in python. Maintain source code repository in subversion and handled branching, tagging & merging process.
  • Research, design and develop computer software systems, applications which require use of advanced computational and quantitative methodologies

Confidential, Omaha, NE

Data Scientist

Responsibilities:

  • Performed Data Profiling to learn about behavior with various features such as traffic pattern, location, time, Date and Time etc. Evaluated models using CrossValidation, Logloss function, ROCcurves and used AUC for feature selection.
  • Detected the near-duplicated news by applying NLP methods(word2vec) and developing machine learning models like label spreading, clustering.
  • Collected data needs and requirements by Interacting with the other departments.
  • Used Principal Component Analysis in feature engineering to analyze high dimensional data.
  • Used clustering technique K-Means to identify outliers and to classify unlabeled data.
  • Ensured that the model has low False Positive Rate.
  • Created and designed reports that will use gathered metrics to infer and draw logical conclusions of past and future behavior.
  • Used MLlib, Spark'sMachinelearning library to build and evaluate different models.
  • Created Data Quality Scripts using SQL and Hive to validate successful data load and quality of the data. Created various types of data visualizations using Python and Tableau.
  • Developed MapReduce pipeline for feature extraction using Hive.
  • Application of various machine learning algorithms and statistical modeling like decisiontrees, regressionmodels, neuralnetworks, SVM, clustering to identify Volume using scikit-learn package in python, Matlab.
  • Performed data analysis by using Hive to retrieve the data from Hadoop cluster, Sql to retrieve data from Oracle database.
  • Performed DataCleaning, featuresscaling, featuresengineering using pandas and numpy packages in python.
  • Implemented rule based expertise system from the results of exploratory analysis and information gathered from the people from different departments.
  • Analyze traffic patterns by calculating autocorrelation with different time lags.
  • Addressed overfitting by implementing of the algorithm regularization methods like L2 and L1.
  • Performed Multinomial Logistic Regression, Randomforest, DecisionTree, SVM to classify package is going to deliver on time for the new route.
  • Communicated the results with operations team for taking best decisions.

Environment: Impala, Linux, Spark, Tableau Desktop, Python 2.x, CDH5, HDFS, Hadoop 2.3, Hive, SQL Server 2012, Microsoft Excel, NLP

Confidential, Santa Ana, CA

Data Scientist/Data Analyst

Responsibilities:

  • Developed Python scripts to automate data sampling process. Ensured the data integrity by checking for completeness, duplication, accuracy, and consistency
  • Worked on model selection based on confusion matrices, minimized the TypeII error
  • Generated data analysis reports using Matplotlib, Tableau, successfully delivered and presented the results for C-level decision makers
  • Worked on data cleaning and reshaping, generatedsegmented subsets using Numpy and Pandas in Python
  • Continuously collected business requirements during the whole projectlifecycle.
  • Generated cost-benefit analysis to quantify the model implementation comparing with the former situation.
  • Identified the variables that significantly affect the target
  • Applied various machine learning algorithms and statistical modeling like decision tree, logistic regression, GradientBoostingMachine to build predictive model using scikit-learn package in Python
  • Conducted model optimization and comparison using stepwise function based on AIC value
  • Wrote and optimized complex SQL queries involving multiple joins and advanced analytical functions to perform data extraction and merging from large volumes of historical data stored in Oracle 11g, validating the ETL processed data in target database

Environment: Numpy, Pandas, Tableau 7, Python 2.6.8, Matplotlib, Oracle 10g, SQL,Scikit-Learn, MongoDBClient

Confidential, Stamford, CT

Data Architect/Data Modeler

Responsibilities:

  • Develop Integrations jobs to transfer data from source system to Hadoop.
  • Installation of TalendStudio.
  • Technical design documents for Transformation processes.
  • Application of business rules on the data being transferred.
  • Task allocation for the ETL and Reporting team.
  • Communicate effectively with client and their internal development team to deliver product functionality r equirements.
  • Architecting and design of data warehouse ETL processes.
  • Demo of POC built for the prospective customer and provide guidance and gather the feedback to b ackend ETL testing on SQLServer 2008 using SSIS.
  • Create Integration Jobs to backup a copy of data in network file system.
  • Design and implement the ETL Data model and create staging, source and Target tables in SQL server database.
  • Gathering and analysis requirements definition meetings with business users and document meeting outcomes.

Environment: Hadoop, MS Office,Talend Studio, ETL, ODS, OLAP, SQL Server 2008.

Confidential

Data Analyst/Data Modeler

Responsibilities:

  • Implemented a job which leads an electronic medical record, extract data into OracleDatabase and generate an output. Analyze the data and provide the insights about the customers using Tableau.
  • Designed, implemented and automated modeling and analysis procedures on existing and experimentally created data.
  • Created dynamic linear models to perform trend analysis on customer transactional data in Python.
  • Increased pace & confidence of learning algorithm by combining state of the art technology and statistical methods.
  • Parseddata, producing concise conclusions from rawdata in a clean, well-structured and easily maintainable format. Developed clustering models for customer segmentation using Python.
  • Developed entire frontend and backend modules using Python on Django Web Framework.
  • Implemented the presentation layer with HTML, CSS and JavaScript.
  • Involved in writing stored procedures using Oracle.
  • Optimized the database queries to improve the performance.
  • Designed and developed data management system using Oracle..

Environment: Python 2.x, Tableau, Oracle, MySQL 5.x, ORACLE, HTML5, CSS3, JavaScript, Shell, Linux & Windows, Django.

Confidential

Data Analyst

Responsibilities:

  • Applied Business Objects best practices during development with a strong focus on reusability and better performance.
  • Developed and executed load scripts using Teradata client utilities MULTILOAD, FASTLOAD and BTEQ.
  • Responsible for development and testing of conversion programs for importing Data from text files into map Oracle Database utilizing PERL shell scripts &SQL*Loader.
  • Developed Tableauvisualizations and dashboards using Tableau Desktop.
  • Used Graphical Entity-Relationship Diagramming to create new database design via easy to use, graphical interface.
  • Formatting the data sets read into SAS by using Format statement in the data step as well as Proc Format.
  • Worked with the ETL team to document the Transformation Rules for DataMigration from OLTP to Warehouse Environment for reporting purposes.
  • Used GraphicalEntity-Relationship Diagramming to create new database design via easy to use, graphical interface.
  • Co-ordinate with various business users, stakeholders and SME to get Functional expertise, design and business test scenarios review, UAT participation and validation of financial data.
  • Responsible for development and testing of conversion programs for importing Data from text files into map OracleDatabase, utilizing, PERL, shellscripts&SQL*Loader.

Environment: Business Objects, Oracle SQL Developer, PL/SQL, MS SQL Server, TOAD, Tableau, Informatica, SQL*PLUS, SQL*LOADER, XML.

We'd love your feedback!