We provide IT Staff Augmentation Services!

Big Data Developer Resume

4.00/5 (Submit Your Rating)

Houston, TX

PROFESSIONAL SUMMARY:

  • Around 5 years of IT experience involving project development, implementation, deployment and maintenance using Data Science, Analysis and Big data Hadoop Ecosystem related technologies in Insurance, Health Care & Retail Industry Project sectors with multiprogramming language expertise like Python, SQL, R, Java
  • Experience in using Hadoop and its ecosystem components like HDFS, MapReduce, Yarn, Spark, Hive, Pig, HBase, Zoo Keeper, Oozie, Flume, Storm and Sqoop.
  • 2 year of experience in Hadoop ecosystem which include real time data processing using storm and spark, deploying multi - node cluster using Cloudera and Hortonworks distributions.
  • Hands-on experience with Hadoop applications (such as administration, configuration management, monitoring, debugging, and performance tuning)Strong experience creating real time data streaming solutions using Apache Spark Core, Spark SQL and Data Frames. Hands on experience with Spark streaming to receive real time data using Kafka.Developed Simple to complex Map/reduce streaming jobs.
  • Developed UDFs
  • Involved in building, evolving and reporting framework on top of the Hadoop cluster to facilitate data mining, analytics and dash-boarding
  • Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations.
  • Good Knowledge in analyzing data using Pig Latin, Hive QL.
  • Developed Sqoop scripts for large dataset transfer between Hadoop and RDBMs. Worked on NoSQL databases including HBase, Cassandra and Mongo DB.
  • Collect, clean, and analyze high-volume datasets acquired usingrelationaldatabases using SQL, and Tableau
  • Created and worked Sqoop jobs with incremental load to populate Hive External tables
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance
  • Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs
  • Good knowledge of Amazon Web Service components like EC2, EMR, S3 etc.
  • Experience and knowledge NoSQL database like HBase, Cassandra or MongoDB.
  • Good understanding of Cassandra Data Modelling based on applications.
  • Experienced in performingstatisticalanalysisondatato evaluate and isolate the critical factors influencing trends and relationships of variables to drive improved results. Well-versed in writing complex SQL queries, programming in Python and utilizing Tableau to analyzedata
  • Collected cleaning and analyzed large and raw data of users using python libraries like Pandas and NumPy and used machine leaning algorithms to find the accuracy of a prediction
  • Experience in using Machine learning and machine learning algorithms along with spark machine learning libraries
  • Developed and Tested algorithms to find the efficiency and accuracy of certain models which were about to be deployed
  • Good working knowledge of Natural Language Processing and Deep Learning
  • Performed data cleaning and feature selection using MLlib package in PySpark. understanding of Deep learning using CNN, RNN, ANN, reinforcement learning, transfer learning.Forecast analysis of data during fiscal year. Conducted data extraction, data manipulation over large relational data sets using R and SQL.
  • Developed predictive models and statistical analysis for our internal business, hypothesis testing to validate data, performing logistic regression modeling by targeting responsive customers and minimizing risk. Good Working knowledge of BI Tool like Tableau Participated in requirement analysis, reviews and working sessions to understand the requirements and system design Good Inter personnel skills and ability to work as part of a team.
  • Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines. Good team player with ability to solve problems, organize and prioritize multiple tasks.

TECHNICAL SKILLS:

Hadoop/Big Data: Hadoop (Yarn), HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Flume, Kafka, Storm, Zookeeper, Oozie, Tez, Impala, Mahout, Ganglia, Nagios.

Programming/Scripting Languages: Python, R, C, Core Java, SQL, C#, Scala

Databases: MySQL, SQL Server2005,2008

NoSQL Databases: HBase, Cassandra, MongoDB

Java: Core Java, JDBC, Web services

ETL Tools: Informatica

Visualization: Tableau, MS Excel, Shiny, Matplotlib, Seaborn, ggplot

Cluster Management and Monitoring: Cloudera Manager, Ambari, Ganglia, Nagios

Modeling languages: UML Design, Use case, Class, Sequence, Deployment and Component diagrams.

Cloud Environment: AWS

Methodologies: Agile/ Scrum, Rational Unified Process and Waterfall.

Operating Systems: Windows 98/2000/XP/Vista/7/8, 10, Macintosh, Unix, Linux

PROFESSIONAL EXPERIENCE:

Confidential, Houston, TX

Big Data Developer

Responsibilities:

  • Handled importing of data from various data sources, performed transformations using Hive, Pig and loaded data into HDFS for aggregations.
  • Participate in Design Reviews & Daily Project Scrums
  • Worked with different AWS services like S3, EC2, EMR, Redshift and Kinesis for storing, analysis and streaming the data.
  • Worked closely with the business analysts to convert the Business Requirements into Technical Requirements and prepared low- and high-level documentation.
  • Hands on experience in joining raw data with the data using Pig scripting.
  • Written custom UDF’s in Hive.
  • Hands on extracting data from different databases and to copy into HDFS file system using Sqoop.
  • Created Oozie coordinated workflow to execute Sqoop incremental job daily.
  • Used Oozie workflow engine to run multiple Hive and Pig jobs.
  • Involved in installing and configuring Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
  • Good knowledge on HBase
  • Extensive use and experience with Python programming language.
  • Worked with Spark, PySpark, Spark Streaming and Machine Learning libraries in pyspark (MlLib)
  • Used Spark Streaming to stream real time data into the website which provided updates
  • Used visualization tools like Shiny and Tableau
  • Integrated Kafka and Spark Streaming, worked on spark streaming use case using Kafka
  • Used Spark AWS for stream processing and to construct prediction models.
  • Used different AWS services for on premises and cloud computing and also integrated some key products into AWS.
  • Worked on using Amazon EMR clusters with EC2, to utilize spot capacity for short term computational and streaming.
  • Hands on experience in working with different AWS Machine Learning services like SageMaker, Personalize, DeepLens and Textract
  • Communicate deliverables status to user/stakeholders, client and drive periodic review meetings.
  • On time completion of tasks and the project per quality goals.

Environment: Hadoop, HDFS, Map Reduce, HIVE, Pig, Sqoop, HBase, Oozie, My SQLl, SVN, Putty, Zookeeper, UNIX, Shell scripting, Spark, Spark Streaming

Confidential, Columbus, OH

Big Data Developer

Responsibilities:

  • Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the HDP cluster using Ambari.
  • Developed custom MapReduce programs and custom User Defined Functions (UDF's) in Hive to transform the large volumes of data with respect to business requirement.
  • Worked with big data Analysts, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig, and Flume etc.
  • Involved in HBase setup and storing data into HBase, which was used for further analysis.
  • Deployed remote Hive Meta store using MySQL.
  • Assisted in Implementing High Availability of HDFS to avoid single point of failure (SPOF) using Quorum Journal Manager (QJM)
  • Written custom UDF’s in Hive
  • Imported weblogs from the web servers into HDFS using Flume.
  • Worked on analyzing data with Hive and Pig.
  • Performed Sqoop imports from data sources to HDFS.
  • Imported data from different using Sqoop for visualization and to generate reports.
  • Automated processes using oozie scheduler.
  • Custom shell scripts for automating redundant tasks on the cluster.
  • Written Sqoop incremental import job to move new / updated info from Database to HDFS.
  • Working with clients on requirements based on their business needs.
  • Communicate deliverables status to user/stakeholders, client and drive periodic review meetings.
  • On time completion of tasks and the project per quality goals.

Environment: Hortonworks Hadoop, Ambari, MongoDB, Linux, HDFS, Hive, Pig, Sqoop, Zookeeper, \

Confidential

Data Analyst

Responsibilities:

  • Responsible for configuring Data Loader, uploading data in CSV files into salesforce, checking for integrity of the data.
  • Conducted data extraction, data manipulation over large relational data sets using R, SQL
  • Data profiling by writing PL/SQL queries. Experienced in creating UNIX scripts for file transfer and manipulation. Developed an algorithm using R. Generated SQL scripts and deployed Medicaid, Medicare databases including configuration.
  • Used custom SQL to pull the data in Tableau desktop and validated the results in Teradata and snowflake database.
  • Spark, Python, R for regular expression project in the Hadoop/Hive environment with Linux/Windows for big data resources
  • Created views using tableau Desktop and server that were published to internal team for review and further data analysis and customization using filters and actions. Created custom objects, fields- Leads, Dashboard, Sales using tableau.
  • Experienced in data migration & transformation from existing data stores to Hadoop. Developed Py-Spark modules for predictive analytics in Hadoop on AWS. Designed and implemented system architecture for Amazon EC2 instances.
  • Worked on Amazon Redshift andAWSand architecting a solution to loaddata, createdatamodels and run BI on it.
  • Import data from proprietary database into Hadoop eco-system. Using Sqoop commands extract data from Hive to oracle table.
  • Wrote a Python library to consume textdatafrom variousdatasources, serialize into a Pandas data frame
  • Analyzed largedatasets applymachinelearningtechniques and develop predictive models, statistical models and developing and enhancing statistical models by leveraging best-in-class modeling techniques.
  • Used Pandas, NumPy, Seaborn, Matplotlib, Scikit-learn in Python for developing various machine learning models such as Logistic regression, KNN and Gradient Boosting. Conducted exploratorydataanalysis using python NumPy and pandas.
  • Performed statisticalanalysisto identify changes indatapatterns and generated periodic reports based on datausing SSRS.
  • Wrote SQL queries to retrieve relateddatato supportstatisticalanalysis. Used Tableau for PL/SQL querieddata, anddataanalysis, generating reports, statisticalanalysis. Conducteddatamining,datamodeling, andstatisticalanalysis.
  • Developed Map/Reduce jobs using Hive, reviewing Hadoop log files. Used pig for data cleansing, understanding of YARN

Environment: Hadoop, Hive, SQL, Python, Tableau, SAS, AWS, Excel, Machine Learning Algorithms, Deep Learning

We'd love your feedback!