Big Data Developer Resume Bloomington, IL - Hire IT People

SUMMARY:

7 years of total IT experience in Big Data Analysis and development, 5+ years of experience in Data Science, Information Availability, Information Governance for various domains
Experience in design and development of applications using Hadoop and its ecosystem components like Hadoop, Hive, Spark, Scala, Sqoop, Kafka, HBase and YARN
Excellent knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node
Hands on experience with Scala language features - language fundamentals, Classes, Objects, Traits, Collections, Case Classes, Higher Order Functions, Pattern Matching, Extractors, etc.
Experience on Hadoop Distributions HDP 2.6.x and CDH 5.x
Experience in developing Spark streaming applications using Scala to consume real - time transactions via Kafka Topics
Experience on building the applications using Spark Core, Spark SQL, Data Frames, Spark Streaming
Expertise on usage of SQL queries to extract data from RDBMS databases - MySQL, DB2, Oracle and Postgres SQL
Experience on importing the data from RDBMS databases MySQL, Oracle and DB2 into Hadoop data lake using Sqoop
Experience on data ingestion tool NiFi, used to extract data from various data sources into Hadoop data lake
Experience on job scheduling tools - Control-M and Oozie
Experience on distributed SQL engines such as Presto to enable low latency data extractions from Hadoop for analytical purposes
Experienced in AWS - S3, EC2, RDS and EMR
Experience in developing Spark applications using DataFrame and Datasets. Transformed data using PySpark, Spark SQL, performance tuning techniques using Catalyst and Tungsten
Hands on experience to migrate existing data from traditional warehouse locations to Hadoop cluster and create common data lake and consumption Data Mart to enable regulatory and MI reporting
Experience on NoSQL databases HBase, MongoDB and Cassandra
Experience in real - time messaging systems such as Kafka to ingest real time streaming data into Hadoop
Worked with different Bug Tracking Tools like Remedy, and Jira
Experience on developing Spark batch applications to ingest data into common data lake using Scala
Experience in analyzing data using HiveQL
Experience in importing and exporting data using Sqoop from RDBMS to HDFS and vice-versa
Experience in architecting, designing, implementing and deploying the Data Protection Software suite and Digital Investigation software suite for diverse environments
Experience in building Data pipelines, Data Engineering, Data Mining & programming Machine Learning Algorithms (supervised and unsupervised) to gather insights off the data
Proficient in Machine Learning techniques (Decision Trees, Linear/ Logistic Regression, Random Forest, K - Nearest Neighbors) and Statistical Modeling in Forecasting/ Predictive Analytics, Hypotheses Testing, Factor Analysis/ PCA
Experience in analyzing, manipulating and developing machine learning models with Python using Scikit - Learn, NumPy, SciPy and Pandas
Experience analyzing, manipulating and developing machine learning models with data with R using libraries ggplot2, evir, Ecdat, car, caret, Cubist, mlbench, AppliedPredictiveModelling, Cubist, plyr and pROC
Experience in quantitative research methods and analysis (ANOVA, ARIMA, ARMA, factor analysis, regression analysis, SVM, Naïve Bayes, Anomaly detection)
Experience in visualizing infographics to deliver meaningful insights of data using RShiny & Tableau
Excellent networking and communication with all levels of stakeholders as appropriate, including executives, application developers, business users, and customers
Experience working with Agile and Waterfall methodologies

PROFESSIONAL EXPERIENCE:

Confidential, Bloomington, IL

Big Data Developer

Worked with Project Manager, Business Leaders and Technical teams to finalize requirements and create solution design & architecture
Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hive, Spark, Scala Sqoop
Design and Develop Spark code using Scala, PySpark & Spark SQL for high speed data processing to meet critical business requirement
Analyzed the SQL scripts and designed the solution to implement using PySpark
Implement RDD/Datasets/DataFrame transformations in Scala through SparkContext and HiveContext
Developed algorithms & scripts in Hadoop to import data from source system and persist in HDFS (Hadoop Distributed File System) for staging purposes
Developed Shell scripts to perform Hadoop ETL functions like Sqoop, create external/internal Hive tables, initiate HQL scripts
Developed scripts in Hive to perform transformations on the data and load to target systems for reporting
Worked on all four stages - data ingest, data transform, data tabulate and data export
Maintained fully automated CI/CD pipelines for code deployment (Gitlab/ Jenkins/ IBM UC Deploy)
Built code using Java, Spring boot, Maven, and Jenkins for building and automating our data workflow
Performed Junit Tests and Functional tests for validating our code
Actively managed, improved, and monitored cloud infrastructure on AWS - EC2, S3, and EMR
Wrote Puppet manifests and modules to deploy, configure, and manage servers for internal DevOps process

Environment: Cloudera Hadoop, HDFS, Yarn, Java, Spring Boot, Maven, Jenkins, Gitlab, Git, Hive, PySpark, Spark SQL, Sqoop, MS SQL Server, Oracle, SQL/ NoSQL, Linux, Puppet, Tableau

Confidential, Iselin, NJ

Big Data Consultant

Worked closely with customers to understand their current technical environment, key business drivers and future technology requirements
Developed project proposals and Statements of Work based on the gathered requirements and the proposed solution
Loaded data from different relational data sources into HDFS using Sqoop and exported them to partitioned Hive tables
Designed both Managed and External tables in Hive to optimize performance
Worked with various file formats such as Parquet, Avro, ORC, CSV, flat files and JSON
Exported and Imported data into HDFS and Hive using Sqoop
Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase
Implemented Kafka Security Features using SSL and without Kerberos. Further with more grain-fines Security I set up Kerberos to have users and groups this will enable more advanced security features
Installed Kerberos secured Kafka cluster with no encryption on Dev and Prod
Installed Ranger in all environments for Second Level of security in Kafka Broker
Designed and Implemented Kafka Producer Application to produce real time data using Apache Kafka Connect; Used Change Data Capture (CDC) software and Oracle Golden Gate real time data replication tool
Implemented different data formatter capabilities and publishing to multiple Kafka Topics
Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS
Used Kafka HDFS connector to export data from Kafka topics to HDFS files in a variety of formats and integrate with Apache Hive and then into HBase
Integrated Apache Kafka with Elasticsearch using Kafka Elasticsearch Connector to stream all messages from different partitions and topics into Elasticsearch for search and analysis
Worked on Kafka and REST API to collect and load the data on HBase and Hive
Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and Persists into HBase database
Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala
Automated build and deployment using Jenkins to reduce human error and speed up production processes
Maintain build profiles in Team Foundation Server and Jenkins for CI/CD pipeline
Built statistical models on AWS EMR by uploading data in S3 and creating instance on EC2
Performed real-time streaming using different payment system EDMi and published on Kafka Topics
Used PySpark, Spark MLlib to perform Classification, Regression and Clustering on data
Used Spark Streaming to aid with real-time analytics on data coming in through Kafka pipelines
Actively developed predictive models and strategies for effective fraud detection for credit and customer banking activities using k-Means clustering using Python (PySpark)
Developed a linear regression model to predict a continuous measurement for improving the observation on credit data; developed using spark with Python API (PySpark).
Assisted senior data scientist in performing text mining on customer review data using topic modeling and sentiment classification
Performed k-Means clustering in order to understand customer backgrounds and segment the customers based on transaction behavior information for customized product offering, to improve existing profitable relationships and to avoid customer churn using R
Built interactive dashboards for business using Tableau

Environment: Cloudera Hadoop, HDFS, Yarn, MapReduce, Scala, Hive, Spark, PySpark, Spark SQL, HBase, Sqoop, Kafka, MS SQL Server, Oracle, SQL/ NoSQL, Linux, Python, R, NumPy, SciPy, Pandas, Scikit- Learn, Tableau

Confidential, Denver, CO

Big Data Analyst

Used R and Python programming to perform exploratory data analysis and visualization components
Developed audience extension models relying using machine learning algorithms - decision trees, random forest, logistic regression, and other categorical data (Hadoop - Python - R)
Performed web scraping using BeautifulSoup library to extract data for building graphs and visualizations
Developed ARIMA and EWMA forecasting model to perform predictive analytics
Developed prediction model applying Classification using Decision Tree (J48) classifier
Developed strategic and analytical dashboards using Tableau
Generated KPI’s for customer satisfaction survey results - Developed Tableau workbooks from multiple data sources using Data Blending - Developed Pareto charts, stacked bar graphs, Histograms and Scatter plot
Worked with team of developers to design, develop and implement a BI solution for Sales, Product and Customer KPIs - Pareto Analysis

Environment: Hortonworks Hadoop, HDFS, Yarn, Hive, Python, R, MS SQL Server, Oracle 11g R2, MongoDB

Confidential, Fairfax, VA

Geospatial Data Analyst - Research

Provide guidance and organize data access based on database privileges
Provided solutions to the customer to streamline data to work across multiple software platforms
Complete ad hoc research requests and surveys by interpreting data questions
Categorized multiple sources of data, including real-time or dynamic, and imagery
Collect data from internal and external sources and conduct analysis using inferential statistical techniques
Used R statistical software for effective analysis by hypothesis testing to validate data and interpretations
Collected data using SQL and R - cleaned with R and visualized using Tableau 9
Trained and supervised undergraduate students
Produced static maps and provided web-based mapping support
Participated in public involvement meetings as a representative of the company/client to present project information, address concerns and provide feedback to impacted residents
Created dynamic data visualizations for reports and presentations to regulators, clients, and the community

Environment: Python, R, Weka, MS SQL Server, Machine Learning, SQL/ NoSQL, Linux, Tableau, NumPy, SciPy, Pandas, Scikit- Learn, Seaborn, BeautifulSoup

Confidential

Data Analyst

Nature of the work involves mainly cleaning and analyzing of geospatial data from GIS domains and ingest into Google Maps API as per country specific security policies using Techmate
Supported the collection, analysis, harmonization, and loading of metadata into a metadata repository
Transformed third party raw mapping data utilizing SQL database query tool and curated data was distributed to different business units to meet strict deadlines
Rendered satellite imaginary, and user edits to develop integrated geographical maps for GPS feeds
Performed Data Profiling utilizing statistics such as minimum, maximum, mean, median, mode, percentile, standard deviation and variations such as count and sum
Reduced marketing cost per AdWords lead by $100
Performed keyword research and built PPC campaigns from ground up - product lifecycle analysis
Tracked sales metrics - ROI, revenue from natural/paid search, CTR, CPC, conversions - for managed search terms - keywords using Google Analytics
Developed organizational strategy and content for web and email marketing campaigns

Environment: Google Analytics, Linux, Shell Scripting, Google AdWords, MS SQL Server, Microsoft Excel

Confidential

Data Analyst

Extracted data from Oracle and MS SQL Server using Informatica to load it into a single data warehouse repository
Synthesized data reported ad-hoc utilizing Excel, & Crystal Reports
Designed and developed the ETL process from different source system to transform the data as per the business requirements to be used by the reporting teams
Created dimensional model based on star schemas and designed them using ERwin
Participated in client discussions to gather scope information and perform analysis of scope information to provide inputs for project scoping documents
Designed and developed Marketing ad hoc reports using Power BI
Developed Power BI model used for financial reporting of P & L
Wrote calculated columns, measures query’s in Power BI desktop
Worked with end user to convert old reports into OBIEE reports
Supported process innovation for the Retail business unit by developing Strategic Capacity analysis
Created Business Requirement documents (BRD), Functional & Technical Requirement documents
Analyzed & collected data to assist customers in planning, forecasting, and in managing their business

Environment: MS SQL Server, Oracle, Erwin, MS Visio, Power BI, Microsoft Excel

TECHNICAL SKILLS

Reporting & Analysis: MS Excel, Tableau, Google Analytics, MSBI, SSIS, SSRS

Languages: UNIX, SQL, Java, Python, Scala

Databases: MS SQL Server 2008, MySQL, MS-Access, Oracle 11g R2, MongoDB

Operating Systems: Windows, OS X

Statistical/ Data Mining: Python, R

Python Packages: NumPy, SciPy, Pandas, Scikit- Learn, TensorFlow, Matplotlib, Seaborn, OpenCV, PySpark

Big Data Technologies: Hadoop, Spark, Kafka, Sqoop, Hive, MapReduce, Yarn

Data Operations: GIS, Operational Research, SEO, A/B Testing, Pattern Recognition, Predictive analysis, Visualization, Machine Learning (Supervised & Unsupervised)

Cloud Computing: AWS - S3, EC2

Other Tools: Git, IntelliJ IDE, PyCharm, Anaconda, Spring Bot, Maven, Jenkins

We provide IT Staff Augmentation Services!

Big Data Developer Resume

Bloomington, IL

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship