Big Data Developer Resume
Phoenix, AZ
SUMMARY:
- Solid programming experience with Java 7, Python 2.7.0/3.5.0 , R 3.5.1, HTML 5, CSS 3 and environments like Linux and UNIX.
- Technical experience of using Hortonworks 2.6.5, Databricks 2.4.2, and Hadoop working environment including Hadoop 2.8.3, Map Reduce, HDFS 2, YARN, Hive 1.2.2, Sqoop 1.4.7, Flume 1.5.0.1, Apache Spark 2.2.1, Kafka 1.3.2
- Comprehensive knowledge of Core Java Concepts and Collections Framework, Object Oriented Design and Exception Handling and good working knowledge of Eclipse IDE 4.7
- Experience in testing applications using JUnit 4.12 and used JIRA for bug tracking and Agile project methodology.
- Experience working with source and version control systems like BitBucket, Git 2.12, GitHub.
- Worked with different file formats like Text file, Avro, Parquet files
- Worked with Amazon Web Services using EC2 for computations and S3 as a storage mechanism
- Implemented MLlib algorithms for training and testing different models.
- Hands - on experience with Python libraries like Matplotlib 2.2.2, NumPy, SciPy, Pandas
- Strong knowledge with design & analysis of ML/data science algorithms like Classification, Association rules, Clustering and Regression and models like Descriptive, Predictive and Prescriptive analytics, Machine Learning (ML), Deep Learning (DL), Natural Language Processing (NLP), Text Analytics, Data Mining, Unstructured Data Parsing and Sentiment Analysis.
- Neural network libraries: TensorFlow r1.8.0, Keras 2.2.1 etc.
- Hands-on working experience with Brain Computer Interfaces - Emotiv technology.
TECHNICAL SKILLS:
Programming Languages\ Web Programming & Scripting Languages: C, Java 7, Python 2.7/3.5.0, R 3.5.1\ HTML5, CSS3, JavaScript 6
Databases\ Operating Systems: Oracle 10g, MySQL 5/ 8, Cassandra 2.2, \ Windows 7, 8.1, 10; Linux; xv6 - Unix; HBase 2.0.0\ Android
Frameworks\ Methodologies: Hadoop 2.8.3, Apache Spark 2.2.1\ Agile, Waterfall
Software\ Analysis Tools: Microsoft Excel 2015, Microsoft Access 2015, \ MATLAB 2015, SciLab, Tableau Microsoft Word 2015, Visual Studio 2017, Eclipse Oxygen, NetBeans IDE 8.2, RStudio 1.1.456, Amazon S3, Anaconda 5.1.0, Rapid Miner 7.2, Knime 3.5, PyCharm 2.0, Emotiv, MySQL Workbench
SCM Tools\ Hadoop Ecosystem: Bit Bucket, Git 2.12, GitHub\ HDFS 2, MapReduce, Hive 1.2.2
ETL\ Bug Tracking Tools: Sqoop 1.4.7, Flume 1.5.0\ JIRA
PROFESSIONAL EXPERIENCE:
Confidential, Phoenix, AZ
Big Data Developer
Responsibilities:
- Maintaining, enhancing and upgrading the Cornerstone Data ingestion capabilities for data ingestion
- Creating and managing nodes that utilize Java jars and python, shell scripts for scheduling jobs to customize data ingestion.
- Implement code changes in existing modules - Java, python, shell-scripts for enhancement.
- Performing MySQL queries for efficient retrieval of ingested data using MySQL Workbench.
- Rewriting the existing scripts which are written, using Python and Shell Scripting for efficient code execution. The data formats dealt with are XML, JSON, Parquet, and Text.
- Solve JIRA tickets for debugging the code for errors and Hive is used for data visualization
- Agile is used as Project Management tool and Bit Bucket is used for source code tracking.
Environment: Python, Java, Eclipse - Oxygen, HDFS, Hive, MySQL Workbench, Linux/Unix - Bash Shell Scripting
Confidential, New York, NY
Big Data Developer
Responsibilities:
- Implemented solutions for ingesting data from various sources and processing the data utilizing Big Data Technologies such as Hive, Sqoop, Map reduce, etc.
- Developed multiple MapReduce jobs in Hive for data cleaning and pre-processing.
- Demonstrated better organization of the data using techniques like hive partitioning, bucketing.
- Handled data import and export to and from HDFS, Hive using Sqoop.
- Hive was used for data analysis and Spark context, Spark-SQL for optimizing the analysis of data.
- Spark RDD was used to store and perform in-memory computations on the data.
- Anaconda was used as the IDE for developing the model and WEKA, and RapidMiner were used for generating stats and analysis.
Environment: Hadoop 2.8.3, Hive 1.2.2, Apache Spark 2.2.1, Anaconda 5.1.0, WEKA, and RapidMiner 7.2
Big Data Developer,
Confidential - Austin, TX
Responsibilities:
- PySpark and Spark was used for Data cleaning and Data preprocessing.
- Seaborn library was used for data visualization and analysis.
- Predictive Analysis was performed on Confidential Shopping Cart Clickstream Data using Keras library running on top of Tensorflow.
- Decision Rule Classifier and Treatment Learner TAR3 was used to obtain paths having a high likelihood of leading to a specific outcome-abandoning/purchasing user.
- Used Agile Methodology for project management and Github was used for source code tracking.
Environment: Pandas, Tensorflow r1.8.0, Keras libraries, PySpark 2.2.1, Spark 2.2.1.
Confidential, Houston, TX
Big Data Engineer
Responsibilities:
- Stored the Amazon free datasets on Amazon S3 and used Databricks for processing the data.
- Two datasets in XML and CSV format containing customer demographics and ratings were combined to generate recommendations for users to buy a book predicted to suit their taste.
- Collaborative filtering and ALM Matrix algorithms were used to generate recommendations.
- Analyzed the dataset using Spark MLlib, Python-pandas, Seaborn.
- Used k-fold validation for dividing the training and test data to get the optimum division without data skew.
- ROC curve, Confusion matrix was used for evaluation and visualization was done using ggplot.
- Genetic Algorithm Analysis was used for finding the best possible solution from available options
- Processed the datasets using Ga, Neuralnet and NNet, covariance, correlation packages in RStudio
- Principal Component Analysis was used for identifying the major contributor towards the distribution and experimented with Ensemble learning and Bagging and Boosting techniques.
Environment: Python 3.5.0 - Pandas, Seaborn libraries, Apache Spark 2.2.1 - MLlib, RStudio 1.1.456 - Neuralnet, NNet, Ga packages, R programming
Confidential, Houston, TX
Big Data Engineer
Responsibilities:
- Acquired iris images from Kaggle Repository and enhanced using techniques of histogram equalization, converting to gray-scale,
- Pre-Processed the images using median filtering, normalization.
- Used Morphological processing for extracting image components such that it represented the shape of the objects and partitioned the image using segmentation.
- Extracted features and performed recognition using multilayer feedforward network.
- Compressed the images using JPEG standards.
- Research and Implementation using Brain-Computer Interface using Emotiv EPOC Plus.
- The user interface Xavier Control Panel was trained to detect the user’s facial movements.
- Trained the user interface to pick a color and for moving the brush.
- Brain waves were captured and compared to map the keys to an action using Emokey
- 5 varied gaming apps were created using Unity in C# and JavaScript for demonstration.
- Used CyKit to obtain real time patterns and then analyzed using SciLab and MATLAB tools and Spark and Python was used for data cleaning and processing.
Environment: MATLAB, Emotiv EPOC Plus headset and Xavier Control Panel interface, CyKit, SciLab, C#, JavaScript, Unity, Apache Spark 2.2.1