Data Engineer Resume
SUMMARY
- 5+ years of professional experience in BIG Data technologies, IT programming
- Current Role: Data Engineer
- Past Roles: Senior Hadoop Developer
- Industry Sectors: Telecom, Travel
- HIVE, HQL developing database
- Spark ( adoop) programming, Spark Streaming, rdd, data frame, datasets and analytics
- Hive joins, performance Tuning in joins and SerDe with different formats
- Kafka, Sqoop tools for data ingestion, import data into Hadoop ecosystem
- 1+ year of professional experience in AWS - > EC2, Lambda, Glue ETL, Athena
TECHNICAL SKILLS:
BIG Data: Spark, Pyspark, Hadoop, Map Reduce, Hive, Sqoop, SQL, Python, Glue ETL
Database: MySql, NoSQL, Hbase, Cassandra
Business Intelligence: Hive, Sqoop, Knime, Hue
Programming Languages: Python, Java(core), Linux Shell scripting
Testing: bluefish and geany editor, zookeeper, Jenkins
Operating Systems: Linux ubuntu/mint/centos
PROFESSIONAL EXPERIENCE:
Confidential
Data Engineer
Environment: NiFi, Spark, Cassandra, ML & AI, GRPC, Kubernetes, Docker, Python, Linux Shell Script
Responsibilities:
- Create predictive industrialized use cases which are deployed as containerized solution and are scalable.
- Worked on 2 predictive AI models for Bharti Airtel which were commercialized and deployed on Confidential . The models would predict 4 hours and 1 hour respectively in advance some degradation and take corrective action before the degradation.
- Developed FE (feature Engineering) scripts in spark for machine learning models in pyspark. Also created model for target selection to dynamically locate neighbor cell to transfer data making sure of multiple conditions.
- Feasibility study to commercialize POCs by shortlisting those running within Ericsson which could be developed as a full-scale architecture which would be plugged and played as a solution to multiple customers
Confidential
Senior Hadoop Developer
Environment: Hadoop, spark, Hive, Sqoop, Python, SQL-HQL, Linux Shell Script
Responsibilities:
- Responsible for data modeling.
- Built configured structure data loading into HDFS using Sqoop or locally
- Developed Jenkins job and HQL to load the data into stage and base hive tables.
- Wrote scala programs to filter and process XML files and joining large data sets of json files.
- Make canonical model on Hbase.
- Developed Jenkins job to make job automated
Environment: Hadoop, Kafka, Stream Topic, Spark Streaming, Hbase, Python, Scala, Shell Script
Responsibilities:
- Responsible Kafka producer.
- MapR Stream and topic with partition.
- Spark Streaming to process data and apply HMF rules
- Responsible for dynamic JSON Schema Evaluation.
- Enable upsert operation.
- Responsible for data modeling
Environment: Hadoop, Hive, Spark sql, Hbase, JSON, Python, Scala
Responsibilities:
- Create architecture of hive master-staging hbase integrated tables.
- Apply currency conversion rules.
- Enable upsert operation.
- Responsible for data modeling
Environment: EC2, Lambda, Glue ETL, Athena, xml, parquet
Responsibilities:
- Create AWS Lambda and Glue ETL job.
- Responsible for project architecture and result.
- Responsible for enable/schedule AWS Lambda & Glue Job.
- Responsible for data modeling
Confidential
Associate Hadoop Developer
Environment: Hadoop, Spark, Python, Linux Shell Script
Responsibilities:
- Responsible for the security of client data
- Provide a hassle free interface to transfer data from one database to another
- Developing pyspark code to parse data
- Responsible to map same data types
Python Developer
Environment: Hadoop, Hive, Sqoop, Python, SQL-HQL, Linux Shell Script
Responsibilities:
- Responsible for designing, coding, testing, deployment phases of Data modeling.
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like HDFS, Hive.
- Built configured structure data loading into HDFS using Sqoop
- Developed UNIX script and HQL to load the data into stage and base hive tables after partitioning and bucketing.
- Wrote MapReduce programs to filter and process XML files and joining large data sets.
Environment: Linux Scripts, Python, MySQL, Hadoop, hive, sqoop, AWS (cloud), EC2
Responsibilities:
- This is a Script-based Project.
- The Record of 15 companies is managed through MySql while other 15 companies record is managed through Hive.
- The Stock Market Analysis is broadly divided into two categories.